NSF Award: Exploiting the Massive User Generated Utterances for Intent Mining under Scarce Annotations

Award number: NSF 1909323

Duration (expected): 3 years (10/01/2019 - 09/30/2022)

Award title: Exploiting the Massive User Generated Utterances for Intent Mining under Scarce Annotations

Principal Investigator : Philip S. Yu (psyu@cs.uic.edu)

  • Chenwei Zhang (cwzhang910@gmail.com) (Alumni)
  • Congying Xia (cxia8@uic.edu)
  • Project Goals: With the advance of artificial intelligence and machine learning technology, users interact with computational devices through spoken language to search information or accomplish tasks, as is evident by voice-based personal assistance products in smart home, automobile, education, healthcare, retail, and telecommunications environments. This project studies user intent mining that aims to understand the underlying goals or purposes from user-generated utterances. For example, by asking the personal assistance system "should I bring an umbrella tomorrow?", a user reveals the intention of getting weather information. Intent mining has been an elusive goal for information search due to diverse, implicit expressions in questions, and it is even harder for task accomplishment in conversational systems. For example, by giving a voice command "book a restaurant near me", the system shall learn to follow up with date or dietary preferences questions and refine the task goal, i.e., the intent, according to the user response. This project explores new computational techniques to understand user-generated utterances while addressing the scarcity of annotation data available for intent mining. The research findings and insights are expected to lead to better natural language understanding, dialogue management with reduced requirements on human annotation efforts. The proposed research will be applicable to the design of new question/conservation understanding systems that improve service, user satisfaction with reduced annotation cost. The research projects will engage graduate and undergraduate students to participate in. Research findings will be incorporated into course curriculum.

    The proposed project provides major advancements to the foundation of intent mining from user-generated utterances, by formulating four fundamental intent mining tasks that cover the discovery, annotation, unsupervised learning and sequential modeling phase in mining user intentions. The research tasks are proposed with a specific and consistent focus on dealing with the labeling scarcity issue as it is time-consuming and labor-intensive to obtain a large scale labeled data where user intents are accurately defined and correctly annotated from diverse and noise utterances. The project will include developments of principles, models and algorithms for intent discovery, joint intent and slot annotation, unsupervised intent learning and intent evolvement modeling. Abundant learning schemas such as zero-shot learning, reinforcement learning, generative modeling, and multi-modal learning will be introduced for the ever-intensive scenario where there is not enough annotation data for current learning rationales to succeed out-of-the-box. The research team plans to share results, including datasets and software, with the research community to facilitate future studies.

    Research Challenges: The ability to understand, reason and generalize is central to human intelligence. However, it possesses great challenges for the machine to detect, annotate and model user intentions from diversely expressed utterances, especially under scare annotations. The proposed study provides major advancements to the foundation of intention mining with a special focus on alleviating annotation scarcity, by formulating the definitions and paradigms that faciliate further research. Also, the data-driven intention mining tasks proposed here will significantly enhance and streamline existing paradigms that can be directly applied to various scenarios in not only questions answering, but also dialogue systems and chat-bots that involve either human-human or human-machine interactions.


  • Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu. Joint Slot Filling and Intent Detection via Capsule Neural Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019. [Paper] [Poster] [Code&Data]
  • Congying Xia*, Chenwei Zhang*, Xiaohui Yan, Yi Chang, and Philip Yu. Zero-shot User Intent Detection via Capsule Neural Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. (* denotes equal contribution) [Paper] [Video] [Code&Data]
  • Chenwei Zhang, Wei Fan, Nan Du, Yaliang Li, Chun-Ta Lu, and Philip S. Yu. Bringing Semantic Structures to User Intent Detection in Online Medical Queries. In Proceedings of the IEEE International Conference on Big Data (Big Data), 2017. [Paper] [Slides]
  • Chenwei Zhang, Wei Fan, Nan Du and Philip S. Yu. Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach. In Proceedings of the 25th International World Wide Web Conference (WWW), 2016. [Paper] [Slides]
  • Acknowledgement: This material is based upon work supported by the National Science Foundation under Grant No. (NSF 1909323).

    Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.