-Large graph database management: graph search and indexing on a database of large graphs, and on a single large network.
-Scalable machine mining on large graph(s): community detection, link inference, and collective classification.
-Subgraph pattern mining: finding and extracting useful information from graph structured data sets (e.g., molecular structure graphs) to discover significant features.
Social Network Analysis: Powered by data cloud and mapreduce infrastructure, social network platforms are gathering data on many aspects of our daily lives. Motivated by this trend, our research addresses interesting phenomena on social networks including the following topics:
-Network structure and macro social pattern mining: magnet community detection, and social influence evaluation.
-Influence propagation and social activity mining: social sharing temporal pattern mining, spam detection, and social advertising.
-Role discovery: finding the most influential nodes.
Mining Brain Data: The human brain is one of the most complicated biological structures in the known universe. It is very challenging to understand how it works, especially when disorders and diseases occur. Multiple data representations are usually involved, including neuroimaging tensor data, brain network data, and multi-view biomarkers.
Learning from Multiple Data Sources: Multiple related data sources containing different types of features may be available for a given task. For instance, users’ profiles can be used to build recommendation systems; in addition, a model can also use users’ historical behaviors and social networks to infer users’ interests on related products. It is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models, including transfer learning, crowd sourcing, and heterogeneous learning.
Multi-label Learning: Many real-world classification tasks involve multiple concepts instead of one single concept, and each data object can be assigned with multiple concepts (class labels) simultaneously. Multi-label learning aims at building accurate classification models that can predict multiple concepts collectively for each object.
Stream Mining: Design efficient real-time algorithms for continuous data streams, especially for graph streams.
Heterogeneous Information Networks: Many real-world networks like social networks and information systems usually involve a large number of components, multiple types entities interconnected with different types of relations. We call these networks as heterogeneous information networks, which are critical for modern information infrastructure.
Mining Uncertain and Incomplete Data: Most real data we are facing these days are neither certain nor complete, which becomes a great challenge for applying conventional data mining methods on these data. We aim at designing effective models to perform knowledge discovery from data with uncertainty and incompleteness.
Review Spam Detection: As well-organized spammers are adopting smart strategies in spamming review websites (e.g. amazon.com, yelp.com), traditional language based and feature-extraction based spam detection methods become less effective. It is desirable to design a time series pattern correlation method and a graph-based relational reinforcement model to catch the most prevalent and crafty spam reviews.
Privacy Preserving Data Publishing: Privacy-preserving data publishing provides methods and tools for publishing useful information while preserving data privacy.