Cube Sampled K-Prototype Clustering for Featured Data

Seemandhar Jain; Aditya A Shastri; Kapil Ahuja; Yann Busnel; Navneet Pratap Singh

doi:10.1109/INDICON52576.2021.9691727

Communication Dans Un Congrès Année : 2021

Cube Sampled K-Prototype Clustering for Featured Data

(1) , (2) , (1) , (3, 4) , (2)

1
2
3
4

Seemandhar Jain

Fonction : Auteur

Indian Institute of Technology Indore

Aditya A Shastri

Fonction : Auteur
PersonId : 1122163

SVKM's NMIMS

Kapil Ahuja

Fonction : Auteur
PersonId : 1105675

Indian Institute of Technology Indore

Yann Busnel

Fonction : Auteur
PersonId : 823
IdHAL : yann-busnel
ORCID : 0000-0001-6908-719X
IdRef : 129816647

Département Systèmes Réseaux, Cybersécurité et Droit du numérique

Dependability Interoperability and perfOrmance aNalYsiS Of networkS

Navneet Pratap Singh

Fonction : Auteur
PersonId : 1122164

SVKM's NMIMS

Résumé

Clustering large amount of data is becoming increasingly important in the current times. Due to the large sizes of data, clustering algorithm often take too much time. Sampling this data before clustering is commonly used to reduce this time. In this work, we propose a probabilistic sampling technique called cube sampling along with K-Prototype clustering. Cube sampling is used because of its accurate sample selection. K-Prototype is most frequently used clustering algorithm when the data is numerical as well as categorical (very common in today's time). The novelty of this work is in obtaining the crucial inclusion probabilities for cube sampling using Principal Component Analysis (PCA). Experiments on multiple datasets from the UCI repository demonstrate that cube sampled K-Prototype algorithm gives the best clustering accuracy among similarly sampled other popular clustering algorithms (K-Means, Hierarchical Clustering (HC), Spectral Clustering (SC)). When compared with unsampled K-Prototype, K-Means, HC and SC, it still has the best accuracy with the added advantage of reduced computational complexity (due to reduced data size).

Mots clés

Sampling Cube Sampling Clustering K-Prototype Clustering Principal Component Analysis Clustering Accuracy

Domaines

Algorithme et structure de données [cs.DS] Calcul parallèle, distribué et partagé [cs.DC] Théorie de l'information [cs.IT]

Fichier principal

m51272-jain final.pdf (291.17 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Yann Busnel : Connectez-vous pour contacter le contributeur

https://imt-atlantique.hal.science/hal-03515281

Soumis le : jeudi 6 janvier 2022-16:16:34

Dernière modification le : mardi 12 décembre 2023-09:55:39

Archivage à long terme le : jeudi 7 avril 2022-19:44:53

Dates et versions

hal-03515281 , version 1 (06-01-2022)

Identifiants

HAL Id : hal-03515281 , version 1
ARXIV : 2108.10262
DOI : 10.1109/INDICON52576.2021.9691727

Citer

Seemandhar Jain, Aditya A Shastri, Kapil Ahuja, Yann Busnel, Navneet Pratap Singh. Cube Sampled K-Prototype Clustering for Featured Data. INDICON 2021: IEEE 18th India Council International Conference, Dec 2021, Guwahati, India. pp.1-6, ⟨10.1109/INDICON52576.2021.9691727⟩. ⟨hal-03515281⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES IMT-ATLANTIQUE UR1-MATH-NUM CYBERSCHOOL

52 Consultations

100 Téléchargements

Cube Sampled K-Prototype Clustering for Featured Data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager