Size: 881.47 MB
Data science techniques for pattern recognition, data mining, k-means clustering, and hierarchical clustering, and KDE.
What you’ll learn
- Understand the regular K-Means algorithm
Understand and enumerate the disadvantages of K-Means Clustering
Understand the soft or fuzzy K-Means Clustering algorithm
- Implement Soft K-Means Clustering in Code
- Understand Hierarchical Clustering
- Explain algorithmically how Hierarchical Agglomerative Clustering works
- Apply Scipy’s Hierarchical Clustering library to data
- Understand how to read a dendrogram
- Understand the different distance metrics used in clustering
- Understand the difference between single linkage, complete linkage, Ward linkage, and UPGMA
- Understand the Gaussian mixture model and how to use it for density estimation
- Write a GMM in Python code
- Explain when GMM is equivalent to K-Means Clustering
- Explain the expectation-maximization algorithm
- Understand how GMM overcomes some disadvantages of K-Means
- Understand the Singular Covariance problem and how to fix it
- Know how to code in Python and Numpy
- Install Numpy and Scipy
DescriptionCluster analysis is a staple of unsupervised machine learning and data science. It is very useful for data mining and big data because it automatically finds patterns in the data, without the need for labels, unlike supervised machine learning.
- linear algebra
- Python coding: if/else, loops, lists, dicts, sets
- Numpy coding: matrix and vector operations, loading a CSV file
- Watch it at 2x.
- Take handwritten notes. This will drastically increase your ability to retain the information.
- Write down the equations. If you don’t, I guarantee it will just look like gibberish.
- Ask lots of questions on the discussion board. The more the better!
- Realize that most exercises will take you days or weeks to complete.
- Write code yourself, don’t just sit there and look at my code.
- Check out the lecture “What order should I take your courses in?” (available in the Appendix of any of my courses, including the free Numpy course)
- Students and professionals interested in machine learning and data science
- People who want an introduction to unsupervised machine learning and cluster analysis
- People who want to know how to write their own clustering code
- Professionals interested in data mining big data sets to look for patterns automatically