Machine Learning for Economists: An Introduction
A crash course for economists who would like to learn machine learning.
Why should economists bother at all? Machine learning (ML) generally outperforms econometrics in predictions. And that is why ML is becoming more popular in operations, where econometrics’ advantage in tractability is less valuable. So it’s worth knowing the both, and choose the approach that suits your goals best.
These articles have been written by economists for economists. Other readers may not appreciate constant references to economic analysis and should start from the next section.
- Athey, Susan, and Guido Imbens. “NBER Lectures on Machine Learning,” 2015. A shortcut from econometrics to machine learning. Key principles and algorithms. Comparative performance of ML.
- Varian, “Big Data: New Tricks for Econometrics.” Some ML algorithms and new sources of data.
- Einav and Levin, “The Data Revolution and Economic Analysis.” Mostly about new data.
Practical applications get little publicity, especially if they are successful. But these materials do give an impression what the field is about.
- Bloomberg and Flowers, “NYC Analytics.” NYC Mayor’s Office of Data Analysis describes their data management system and improvements in operations.
- UK Government, Tax Agent Segmentation.
- Data.gov, Applications. Some are ML-based.
- StackExchange, Applications.
Governments use ML sparingly. Developers emphasize open data more than algorithms.
- Kaggle, Data Science Use cases. An outline of business applications. Few companies have the data to implement these things.
- Kaggle, Competitions. (Make sure you chose “All Competitions” and then “Completed”.) Each competition has a leaderboard. When users publish their solutions on GitHub, you can find links to these solutions on the leaderboard.
Industrial solutions are more powerful and complex than these examples, but they are not publicly available. Data-driven companies post some details about this work in their blogs.
Various prediction and classification problems. For ML research, see the last section.
- Stanford’s CS229 Course, Student projects. See “Recent years’ projects.” Hundreds of short papers.
- CMU ML Department, Student projects. More advanced problems, compared to CS229.
A tree of ML algorithms:
Econometricians may check the math behind the algorithms and find it familiar. Mathematical background:
- Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning. Standard reference. More formal approach. [free copy]
- James et al., An Introduction to Statistical Learning. Another standard reference by the same authors. More practical approach with coding. [free copy]
- Kaggle, Metrics. ML problems are all about minimizing prediction errors. These are various definitions of errors.
- (optional) Mitchell, Machine Learning. Close to Hastie, Tibshirani, and Friedman.
For what makes ML different from econometrics, see chapters “Model Assessment and Selection” and “Model Inference and Averaging” in The Elements.
Software and Hardware
Stata does not support many ML algorithms. Its counterpart in the ML community is R. R is a language, so you’ll need more tools to make it work:
- RStudio. A standard coding environment. Similar to Stata.
- CRAN packages for ML.
- James et al., An Introduction to Statistical Learning. This text introduces readers to R. Again, it is available for free.
Python is the closest alternative to R. Packages “scikit-learn” and “statsmodels” do ML in Python.
If your datasets and computations get heavier, you can run code on virtual servers by Google and Amazon. They have ML-ready instances that execute code faster. It takes a few minutes to set up one.
I limited this survey to economic applications. Other applications of ML include computer vision, speech recognition, and artificial intelligence.
The advantage of ML approaches (like neural networks and random forest) over econometrics (linear and logistic regressions) is substantial in these non-economic applications.
Economic systems often have linear properties, so ML is less impressive here. Nonetheless, it does predict things better, and more of practical solutions get done in the ML way.