Skip to content

The Vital Role of Publicly Available Datasets in Quantitative Finance Research

In recent years, the importance of publicly available datasets in quant finance research has increased dramatically. These datasets serve as the lifeblood of quantitative analysis, enabling researchers and practitioners to develop and test their models, gain valuable insights, and make more informed investment decisions. On a journey to advance and democratize AI through open source and open science, aisot has prepared such a dataset. 

 

blog publicly available data aisot

 

Quantitative finance relies heavily on data to make informed decisions. Datasets are used as inputs to train machine learning models that quants need to construct and validate their strategies. These datasets encompass a wide range of financial market data, economic indicators and many so called alternative datasets. aisot’s Head of R&D, Dr.Nino Antulov-Fantulin and AI & Quant Advisory Lead Dr. Petter Kolm have released a dataset consisting of millisecond and minute frequency snapshots of trades and limit order books for BTC/USD (i.e. the Bitcoin / US dollars currency pair) from May 31, 2018 through September 30, 2018 from the Bitstamp exchange (https://www.bitstamp.net). Trade data is on a millisecond frequency. Limit order book snapshots are on minute frequency, with aggregated amounts for each price level with depth up to 5000 for the bid and ask sides. 

Importance of publicly available datasets

Publicly available datasets and benchmarks are crucial for advancing Machine Learning. They provide a standardized basis for evaluating and comparing the performance of different algorithms and models, allowing researchers to objectively measure progress. Moreover, they democratize access to essential data, reducing barriers to entry for new researchers and fostering collaboration within the community. Another important role of publicly available datasets is to stimulate innovation by inspiring researchers to tackle challenging problems and develop novel techniques to achieve state-of-the-art results, ultimately driving the field forward. 

In sub-fields of machine learning like computer vision, public datasets have proven to be pivotal for the advancement of the field. They facilitate model comparison, encourage innovation through healthy competition, and enable the development of more accurate and specialized algorithms, ultimately driving advancements in areas like image classification, object detection, 3D vision, medical imaging, and more. These resources serve as a foundation for standardized evaluation, transfer learning, and collaborative research efforts in the field.

In comparison to computer vision, the use of publicly available datasets is not yet common practice in quant finance. Rather than using public datasets, banks and asset managers purchase large amounts of data for in-house research and production. Without publicly available datasets, it is challenging to evaluate the effectiveness of models. Public datasets bring some obvious benefits to quant finance: 

  • Research and Innovation
    Quantitative finance is an ever-evolving field. Researchers are continually developing new models and strategies to adapt to changing market conditions. Public datasets provide excellent benchmarks to advance research in academia and industry. When datasets are made freely available, it fosters a culture of knowledge sharing, leading to the rapid advancement of quantitative techniques and the development of more sophisticated models.
  • Education and Skill Development
    Access to publicly available datasets also plays a pivotal role in educating future quants and analysts. Universities and educational institutions leverage these datasets to teach students the practical aspects of quantitative finance. By working with real-world data, students gain valuable experience and are better prepared for careers in finance. Moreover, aspiring quants can refine their skills by experimenting with datasets, fostering a new generation of quantitative professionals.

Similar to other ML fields such as computer vision, it is important to make more datasets publicly available for quantitative finance research. While research benchmarking is the most obvious benefit of public datasets, they also empower researchers, analysts, and investors to make data-driven decisions, develop innovative approaches, and ensure transparency and accountability in the financial industry. At a broader scale, their availability not only benefits professionals in the industry but can also  contribute to the overall stability and integrity of financial markets worldwide.

Access aisot’s public dataset