Data-Driven Methods in Finance#

The Data-Driven Methods in Finance (DDMIF) course is designed to equip students with the skills necessary to use data for quantitative research in finance and trading. The curriculum is based on various reliable sources such as Quantitative Equity Portfolio Management, An Introduction to Statistical Learning, Introduction to Computational Finance and Financial Econometrics, among others.

The course uses data from reputable sources, such as Yahoo Finance, OpenBB, and WRDS. However, only established financial strategies, textbook algorithms, open-source software, and freely available datasets will be used in class. The course is computer-intensive, and students must have a basic working knowledge of Python, including Numpy, Pandas, and Matplotlib. Additionally, they should have a background in linear algebra, probability, and statistics. Familiarity with web scraping (e.g., Beautifulsoup), optimization packages (e.g., Gurobi or CVXPY), Operation Research topics, and the Google Colab environment is recommended.

The DDMIF course covers various critical topics, such as sample statistics, forecasting, machine learning methods, data scraping, and MLOps. The course focuses on the primary data science workflow that involves generating ideas, sourcing information, extracting features, combining signals, optimizing decisions, and evaluating performance. One unique aspect of the course is an in-class real-time financial forecasting competition, similar to the M6 competition. This competition provides students with valuable hands-on experience and prepares them to apply their learning in real-world financial situations.

The DDMIF course is offered at the Industrial Engineering and Operations Research department at Columbia University. The material has been developed and refined over time, drawing upon the instructor’s extensive experience in teaching related courses. The material has evolved from previous courses on Operations Research and Optimization taught at the Tandon School of Engineering of New York University.

Disclaimer#

The course material is based solely on the textbooks mentioned in the course description and is intended to provide education on using data science and machine learning methods to analyze financial data. The course does not offer any investment advice or pre-constructed trading algorithms. The views expressed in the course do not necessarily represent those of the author, affiliated entities, or agencies.

The objective of the course is to highlight the challenges that arise when applying data science and machine learning methods to financial data. These challenges include a brief history, non-stationarity, shifts in market conditions, and low signal content, which can make it difficult to achieve robust results. The topics covered in the course are intended to provide a reference for using these methods to make informed investment decisions in a systematic and scientific manner.