I am looking to store transactions (all) and order book snapshots (1 second intervals) from 1 exchange and 1 symbol. This will be used to help build a database for creating a backtesting environment as close to the real deal as possible.
Do you guys use hdf5 for your tick storage or is there a better way to do this? I've briefly looked at hdf5 files, CSV files, influxDB, Arctic (on MongoDB) etc etc. There are so many different solutions.
My current plan of attack is to do some benchmarking between influxDB and Arctic and see which workflow works better for me – I plan to start with 1 node and upgrade as I need to. I understand that the databases aren't going to compare to flat files for reading into python so I was thinking of having a workflow where the master database stores all of the information that I want (ticks, order book, Twitter, Google trends etc) and having a nightly scheduled task that takes the different tables and molds the relevant information into an hdf5 flat file for me to backtest on later.
My honest question is this: Is this a stupid approach? What could be better?
Submitted July 18, 2017 at 02:15AM by Pnaul