NSF Award: Exploiting the Massive User Generated Utterances for Intent Mining under Scarce Annotations

Award number: NSF 1909323

Duration (expected): 3 years (10/01/2019 - 09/30/2022)

Award title: Exploiting the Massive User Generated Utterances for Intent Mining under Scarce Annotations

Principal Investigator : Philip S. Yu (psyu@cs.uic.edu)

  • Chenwei Zhang (cwzhang910@gmail.com) (Alumni)
  • Congying Xia (cxia8@uic.edu)
  • Ye Liu (yliu279@uic.edu)
  • Tao Zhang (tzhang90@uic.edu)
  • Project Goals: With the advance of artificial intelligence and machine learning technology, users interact with computational devices through spoken language to search information or accomplish tasks, as is evident by voice-based personal assistance products in smart home, automobile, education, healthcare, retail, and telecommunications environments. This project studies user intent mining that aims to understand the underlying goals or purposes from user-generated utterances. For example, by asking the personal assistance system "should I bring an umbrella tomorrow?", a user reveals the intention of getting weather information. Intent mining has been an elusive goal for information search due to diverse, implicit expressions in questions, and it is even harder for task accomplishment in conversational systems. For example, by giving a voice command "book a restaurant near me", the system shall learn to follow up with date or dietary preferences questions and refine the task goal, i.e., the intent, according to the user response. This project explores new computational techniques to understand user-generated utterances while addressing the scarcity of annotation data available for intent mining. The research findings and insights are expected to lead to better natural language understanding, dialogue management with reduced requirements on human annotation efforts. The proposed research will be applicable to the design of new question/conservation understanding systems that improve service, user satisfaction with reduced annotation cost. The research projects will engage graduate and undergraduate students to participate in. Research findings will be incorporated into course curriculum.

    The proposed project provides major advancements to the foundation of intent mining from user-generated utterances, by formulating four fundamental intent mining tasks that cover the discovery, annotation, unsupervised learning and sequential modeling phase in mining user intentions. The research tasks are proposed with a specific and consistent focus on dealing with the labeling scarcity issue as it is time-consuming and labor-intensive to obtain a large scale labeled data where user intents are accurately defined and correctly annotated from diverse and noise utterances. The project will include developments of principles, models and algorithms for intent discovery, joint intent and slot annotation, unsupervised intent learning and intent evolvement modeling. Abundant learning schemas such as zero-shot learning, reinforcement learning, generative modeling, and multi-modal learning will be introduced for the ever-intensive scenario where there is not enough annotation data for current learning rationales to succeed out-of-the-box. The research team plans to share results, including datasets and software, with the research community to facilitate future studies.

    Research Challenges: The ability to understand, reason and generalize is central to human intelligence. However, it possesses great challenges for the machine to detect, annotate and model user intentions from diversely expressed utterances, especially under scare annotations. The proposed study provides major advancements to the foundation of intention mining with a special focus on alleviating annotation scarcity, by formulating the definitions and paradigms that faciliate further research. Also, the data-driven intention mining tasks proposed here will significantly enhance and streamline existing paradigms that can be directly applied to various scenarios in not only questions answering, but also dialogue systems and chat-bots that involve either human-human or human-machine interactions.

    Recent Activity:

  • Question Answering: In real-world question-answering systems, ill-formed questions, such as wrong words, ill word order and noisy expressions, are common and may prevent the QA systems from understanding and answering accurately. In order to eliminate the effect of ill-formed questions, we approach the question refinement task and solve this task with deep reinforcement learning techniques.
  • Reading Comprehension: Human tackle reading comprehension not only based on the given context itself but often rely on the commonsense beyond. We empower the machine to solve the task reading comprehension with commensense reasoning.
  • Intent Detection: Intent Detection is a key task in natural language understanding. We study the task of intent detection in low-resource senarios in which only a few examples are available. We propose to alleviate the scare annotation prbolem by generating labeled examples with natural language generation models.
  • Named Entity Typing: Named entity typing is a classification task of assigning an entity mention in the context with given semantic types. We are interseted in extending model's ability for zero-shot fine-grained named entity typing.
  • Recent Publications:

  • Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu. Entity Synonym Discovery via Multi-piece Bilateral Context Matching. In Proceedings of the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI), 2020. [Paper] [ Code&Data ]
  • Ye Liu, Chenwei Zhang, Xiaohui Yan, Yi Chang, Philip S.Yu. Generative Question Refinement with Deep ReinforcementLearning in Retrieval-based QA System. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM), 2019. [Paper]
  • Ye Liu, Tao Yang, Zeyu You, Wei Fan and Philip S Yu. Commonsense Evidence Generation and Injection in Reading Comprehension. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2020. [Paper]
  • Tingting Liang, Congying Xia, Yuyu Yin, Philip S. Yu. Joint Training Capsule Network for Cold Start Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2020. [Paper]
  • Congying Xia, Chenwei Zhang, Hoang Nguyen, Jiawei Zhang and Philip S Yu. CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection. [Paper]
  • Tao Zhang, Congying Xia, Chun-Ta Lu and Philip S Yu. MZET: Memory Augmented Zero-Shot Fine-grained Named Entity Typing. [Paper]
  • Related Publications:

  • Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu. Joint Slot Filling and Intent Detection via Capsule Neural Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019. [Paper] [Poster] [Code&Data]
  • Congying Xia*, Chenwei Zhang*, Xiaohui Yan, Yi Chang, and Philip Yu. Zero-shot User Intent Detection via Capsule Neural Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. (* denotes equal contribution) [Paper] [Video] [Code&Data]
  • Chenwei Zhang, Wei Fan, Nan Du, Yaliang Li, Chun-Ta Lu, and Philip S. Yu. Bringing Semantic Structures to User Intent Detection in Online Medical Queries. In Proceedings of the IEEE International Conference on Big Data (Big Data), 2017. [Paper] [Slides]
  • Chenwei Zhang, Wei Fan, Nan Du and Philip S. Yu. Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach. In Proceedings of the 25th International World Wide Web Conference (WWW), 2016. [Paper] [Slides]
  • Acknowledgement: This material is based upon work supported by the National Science Foundation under Grant No. (NSF 1909323).

    Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.