Back Filling Data
Backfilling is the process of querying API's or other methods to retrieve historical candlesticks. Every liquidity firm has it's own API and they are largely different. Some will send you json messages that are easily readible and others will send you lists of strings and numbers. It's not consistent for sure but can be done.
Throttling Yourself
When coding a backfiller, the process is straightforward:
- Define a start date and end date to query
- Review the API documentation to determine query speed and response data limits
- Typical performance expectations are approximately 4 hours of 1-minute data per query cycle at ~100ms intervals
- Some APIs may allow faster queries (100ms) while others require slower intervals (300-500ms)
- Throttle requests to avoid violating API rate limits—avoid overwhelming the provider's servers
The Approach
Implement the backfilling process as follows:
- Compile a list of symbols to backfill
- Set a timer to run at the determined query interval (typically ~100ms)
- Pull a symbol from the list and begin querying historical data
- Increase the query range in 4-hour increments (a good balance for most API response limits)
- Continue looping until the current date is reached, then move to the next symbol
- This process will require hours depending on the symbol list size
- Deploy as a recurring Kubernetes job for automated backfilling without manual intervention
Our Approach
It's really the same as above but what we recommend is backfill your data to csv files formatted as QuestDb can import these data files using the COPY command. So our approach is as follows:
- Backfilling service runs as a job in Kubernetes
- CSV files are stored on disk in your NAS
- Write a script where it copies the historical csv data from your NAS into .questdb/import and have it run the COPY command to copy the data into a temporary table before inserting that table into your main historical table