9 Oct , 2017  


  • Understanding of Python at beginner or intermediate level is useful

This course is ideal for those that are interested in data mining/data analysis.

Most data in the world (whether text,audio,visual, etc) is raw or unlabeled. This is precisely the reason that unsupervised machine learning has become so important. By using certain approaches to unsupervised machine learning (like clustering) we can discover patterns or underlying structures in data. This is a major component of exploratory data mining. Furthermore, when one does EDA, it is used to draw hypotheses, assess assumptions about our statistical inferences, and its used as a basis for further research. For example, the conclusion of a cluster analysis could result in the initiation of a full scale experiment.

The course starts by covering two of the most important and common non-hierarchical clustering algorithmsK-means and DBSCAN using Python. Later, I cover hierarchical clustering using theAgglomerative method, utilizing the SAS programming language.  Quite a few examples are used to aide learning.

With K-Means, we start with a ‘starter’ (or simple) example. We then discuss ‘Completeness Score’. The next lesson we discuss how k-means deals with larger variances and different shapes. Then we discuss ‘Color Quantization’. This is used when an individual wants to decrease the size of an image/and or see if there is any underlying structure to an image. Finally, we will take a look at cells of the human body, and do some cell segmentation. For DBSCAN, we will look at a starter example as well using Blobs. Then I will show you how DBSCAN overcomes some of the issues of K-means.

Who is the target audience?
  • Students interested in clustering techniques and unsupervised machine learning
  • Interest in data mining and/or data analysis
