Deep Embedding and Clustering — step-by-step python implementation

Elahe Naserian
9 min readJul 3, 2022

In this article, we are discussing deep image clustering, and more specifically, Unsupervised Deep Embedding for Clustering (DEC). We are going to implement the DEC model while using a transfer model (VGG-16) to extract the features. We use the Pytorch library to implement the model and use the STL-10 dataset to train and test the performance of the clustering.

Image Clustering

The most important technique of unsupervised learning is considered to be Clustering, which allows us to find hidden relationships between the data points in our dataset. Image clustering is an essential data analysis tool in machine learning and computer vision. (Image) clustering is the process of grouping data points (images) into clusters such that the data points (images) within the same clusters are similar to each other, while those in different clusters are dissimilar.

In spite of the success of traditional clustering methods, such as K-means or agglomerative clustering, these methods are heavily dependent on distance or dissimilarity metrics. Distance, in turn, relies on representing the data in a feature space, which for images this feature space might be the raw pixels or gradient orientation histograms. K-means clustering algorithm, for example, uses the Euclidean distance between points in a given feature space. It is clear that the choice of feature space is crucial specifically when it comes to high-dimensional data points such as image datasets…

--

--

Elahe Naserian
Elahe Naserian

Written by Elahe Naserian

An experienced computer scientist/data scientist passionate about problems at the intersection of data science and social science.

Responses (3)