Bondeaux
Bondeaux

Reputation: 174

Most efficient way to store financial data (Python)

The data exists out of Date, Open, High, Low, Close, Volume and it's currently stored in a .csv file. It's currently updating every minute and when time goes by the file keeps growing and growing. A problem is when I need 500 observations from the data, I need to import the whole .csv file and that is a problem yes. Especially when I need to access the data fast.

In Python I use the data mostly in a data frame or panel.

Upvotes: 2

Views: 1366

Answers (3)

tomcounsell
tomcounsell

Reputation: 5081

This was also my problem in 2017 and I struggled with the performance of relational databases. Switching to Redis was 20x faster, although very difficult to implement. To make things easier for myself and others, I wrote an open-source ORM for Python applications on Redis.

import popoto

# DEFINE YOUR MODEL
class AssetPrice(popoto.Model):
    asset = popoto.KeyField()
    timestamp = popoto.SortedField(type=datetime, sort_by=('asset',))
    close_price = popoto.FloatField()


# ADD DATA AS IT STREAMS IN
AssetPrice.create(asset="Bitcoin", timestamp=datetime(2022,1,1,12,0,0), close_price=47686.81)


# QUERY BY ASSET AND TIMESTAMP RANGE
AssetPrice.query.filter(
    asset="Bitcoin", 
    timestamp__gte=datetime(2022,1,1), timestamp__lt=datetime(2022,1,2),
    values=('close_price',)
)

>>> [{'close_price': 47686.81}, ..]

it's available as an easy pip install:

pip install popoto

Documentation here: https://popoto.readthedocs.io

Popoto is free and open source, so feel free to copy it and customize for your own needs

Upvotes: 0

SdSaati
SdSaati

Reputation: 910

You maybe wanna check RethinkDB, it gives you fastness, reliability and also flexible searching ability, it has a good python driver. I also recommend to use docker, because in that case, regardless of which database you want to use, you can easily store the data of your db inside a folder, and you can anytime change that folder(when you have a 1TB hard, now you want to change it to 4TB hard). maybe using docker in your project is more important than DB.

Upvotes: 0

v.alisauskaite
v.alisauskaite

Reputation: 23

I would also suggest to use DB, it is much more convenient to update tables in DB than a csv file, moreover if you have a substantial amount of observations, you will be able to access/manipulate your data much more faster.

Another solution is to keep separate updates in separate .csv files. You can still keep your major file (the one which is regularly updated), and at the same time create separate files for each update.

Upvotes: 1

Related Questions