Skip to Main content Skip to Navigation
Theses

Mesurer la qualité des règles d'association : études formelles et expérimentales

Abstract : Knowledge discovery in databases aims at extracting information contained in data warehouses. It is a complex process, in which several experts (those acquainted with data, analysts, processing specialists, etc.) must act together in order to reveal patterns, which will be evaluated according to several criteria: validity, novelty, understandability, exploitability, etc. Depending on the application field, these criteria may be related to differing concepts. In addition, constant improvements made in the methodological and technical aspects of data mining allow one to deal with ever-increasing databases. The number of extracted patterns follows the same increasing trend, without them all being valid, however. It is commonly assumed that the validation of the knowledge mined cannot be performed by a decision maker, usually in charge of this step in the process, without some automated help. In order to carry out this final validation task, a typical approach relies on the use of functions which numerically quantify the pertinence of the patterns. Since such functions, called interestingness measures, imply an order on the patterns, they highlight some specific kind of information. Many measures have been proposed, each of them being related to a particular category of situations. We here address the issue of evaluating the objective interestingness of the particular type of patterns that are association rules, through the use of such measures. Considering that the selection of ``good'' rules implies the use of appropriated measures, we propose a systematic study of the latter, based on formal properties expressed in the most straightforward terms. From this study, we obtain a clustering of many commonly-used measures which we confront with an experimental approach obtained by comparing the rankingsinduced by these measures on classical datasets. Analysing these properties enabled us to highlight some particularities of the measures. We deduce a generalised framework that includes a large majority of them. We also apply two Multicriteria Decision Aiding methods in order to solve the issue of retaining pertinent rules. The first approach takes into account a modelling of the preferences expressed by an expert in the field being mined about the previously defined properties. From this modelling, we establish which measures are the most adapted to the specific context. The second approach addresses the problem of taking into account the potentially differing values that the measures take, and builds an aggregated view of the ordering of the rules by taking into account the differences in evaluations. These methods are applied to practical situations. This work also led us to develop powerful dedicated software, Herbs. We present the processing it allows for rule selection purposes, as well as for the analysis of the behaviour of measures and visualisation aspects. Without any claim to exhaustiveness in our study, the methodology We propose can be extended to new measures or properties, and is applicable to other data mining contexts.
Document type :
Theses
Complete list of metadata

https://hal-imt-atlantique.archives-ouvertes.fr/tel-03247307
Contributor : Nathalie Fontaine Connect in order to contact the contributor
Submitted on : Thursday, June 3, 2021 - 8:39:55 AM
Last modification on : Wednesday, June 16, 2021 - 3:03:09 AM
Long-term archiving on: : Saturday, September 4, 2021 - 6:05:24 PM

File

vaillant_PhD_2006.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-03247307, version 1

Collections

`

Citation

Benoît Vaillant. Mesurer la qualité des règles d'association : études formelles et expérimentales. Mathématiques [math]. Ecole nationale supérieure des télécommunications de Bretagne, 2006. Français. ⟨NNT : 2006TELB0026⟩. ⟨tel-03247307⟩

Share

Metrics

Record views

18

Files downloads

14