Research Topics

MinD research group aims at developping algorithms for data mining and machine learning with a focus on large-scale. In particular, MinD has an expertise on Concept Lattices, Evolutionary Computation, Multi-Agent Systems, Naïve Bayes, Random Forests, Support Vector Machines, Boosting, Deep Learning, … Those methods are used to extract knowledge from Big Data for :

  • Association rule learning
  • Classification
  • Clustering
  • Optimization
  • Prediction
  • Regression

Concept Lattices

Example dataset and corresponding concept lattice Concept lattices are theoretical structures defined according to the Galois connection of a finite binary relation. Given a set of instances (objects) described by a list of properties (variables values), the concept lattice is a hierarchy of concepts in which each concept associates a set of instances (extent) sharing the same value for a certain set of properties (intent). Concepts are partially ordered in the lattice according to the inclusion relation: Each sub-concept in the lattice contains a subset of the instances and a superset of the properties in the related concepts above it. In data mining, concept lattices serve as a theoretical framework for the efficient extraction of loss-less condensed representations of association rules, the generation of classification rules, and for hierarchical biclustering.


Evolutionary Computation

Evolutionary Algorithms (EA) are nature inspired and stochastic algorithms that mimic Darwin theory for problem optimization. The particularity of EA is its capacity to deal with multi objectives (i.e. maximizing profits while minimizing costs), multi-modality (several best solutions) as the algorithm considers a population of solutions, discrete or continous optimization, dynamic optimization and many others fundamental problems… As data mining deals now with Big Data, it is natural to consider EA for optimizing models (neural network, association rules, decision trees or SVM…) produced by a mining or a learning process. They can also be considered for hybridation.

Multi-Agent Systems


Naïve Bayes


Random Forests

We use Random Forests (RF) for classification and prediction in many different fields. They are quite efficient with high dimensional data and results obtained are better than other classical methods. RF is a supervised approach: the sample used for creating the trees are labeled and separate in two subsets, a training set and a test set. The training set is used to construct the trees of the forest, and the test set is used to validate the created forest. Two specific techniques are often used in the process of construction of the trees: Bootstrap Aggregating (Bagging) to select a subpart of the training set for each tree, and Random Feature Selection to select a subpart of the features characterizing each sample. The best feature to choose in a node of one tree is selected thanks to this subpart. We use RF for short-text classification, body-action recognition with one and several Kinects, classification and prediction of coastal current, classification and prediction for air pollution, prediction for auto-adaptation for many sensor (ubiquitus programming).


Support Vector Machines

In machine learning, support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. Given a set of training examples, each marked for belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.


Boosting

Boosting is a machine learning ensemble meta-algorithm for reducing bias primarily and also variance[1] in supervised learning, and a family of machine learning algorithms which convert weak learners to strong ones.[2] Boosting is based on the question posed by Kearns and Valiant (1988, 1989):[3][4] Can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.


Deep Learning