28 October 2016
Professor of Machine Learning in Computer Science at Amazon and CEO of Dato, Carlos Guestrin, does not have a favourite algorithm, he has a few. Each for his own reason. These are the most useful algorithms in Machine Learning:
Most elegant: The Perceptron algorithm. Developed back in the 50s by Rosenblatt and colleagues, this extremely simple algorithm can be viewed as the foundation for some of the most successful classifiers today, including suport vector machines and logistic regression, solved using stochastic gradient descent. The convergence proof for the Perceptron algorithm is one of the most elegant pieces of math I’ve seen in ML.
Most useful: Boosting, especially boosted decision trees. This intuitive approach allows you to build highly accurate ML models, by combining many simple ones. Boosting is one of the most practical methods in ML, it’s widely used in industry, can handle a wide variety of data types, and can be implemented at scale. I recommend checking out XGBoost for really scalable implementation of boosted trees. Boosting also lends itselft to very elegant proofs.
Biggest comeback: Convolutional neural network deep learningThis type of neural network has been around since the early 80s. Although there was a decline in interest in them from the late nineties to late 2000s, they have seen an amazing comeback in the last five years. In particular, convolutional neural networks form the core of the deep learning models that have been having a huge impact, especially in computer vision and speech recognition.
Most beautiful algorithm: Dynamic programming (e.g., Viterbi, forward-backward, variable elimination & belief propagation algorithms). Dynamic programming is one of the most elegant algorithmic techniques in computer science, since it allows you to search through an exponentially-large space to find the optional solution. This idea has been applied in various ways in ML, especially for graphical models, such as hidden Markov models, Bayesian networks and Markov networks.
Unbeatable baseline: Nearest-neighbor algorithm. Often, when you are trying to write a paper, you want to show that “your cuve is better than my curve”. 🙂 One way to do that is to introduce a baseline approach, and show that your method is more accurate. Well… nearest-neighbor is the simplest baseline to implement, so often folks will try first, thinking they’ll easily beat it and show their method is awesome. To their surprise, nearest-neighbor can be extremely hard to beat! In fact, if you have enough data, nearest neighbor is extremely powerful! And, this method is really useful in practice.
News tags: Tech,