Reputation: 2955
I describe the outcome of a strategy by numerous rows. Each row contains a symbol (describing an asset), a timestamp (think of a backtest) and a price + weight. Before a strategy runs I delete all previous results from this particular strategy (I have many strategies). I then loop over all symbols and all times.
# delete all previous data written by this strategy
StrategyRow.objects.filter(strategy=strategy).delete()
for symbol in symbols.keys():
s = symbols[symbol]
for t in portfolio.prices.index:
p = prices[symbol][t]
w = weights[symbol][t]
row = StrategyRow.objects.create(strategy=strategy, symbol=s, time=t)
if not math.isnan(p):
row.price = p
if not math.isnan(w):
row.weight = w
row.save()
This works but is very, very slow. Is there a chance to achive the same with write_frame from pandas? Or maybe using faster raw sql?
Upvotes: 1
Views: 975
Reputation: 55448
I don't think the first thing you should try is the raw SQL route (more on that in a bit)
But I think it's because of calling row.save()
on many objects, that operation is known to be slow.
I'd look into StrategyRow.objects.bulk_create()
first, https://docs.djangoproject.com/en/1.7/ref/models/querysets/#django.db.models.query.QuerySet.bulk_create
The difference is you pass it a list of your StrategyRow
model, instead of calling .save()
on individual instances. It's pretty straightforward, bundle up a few rows then create them in batches, maybe try 10, 20, a 100 etc at a time, your database configs can also help find the optimum batch size. (e.g. http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_max_allowed_packet)
Back to your idea of raw SQL, that would make a difference, if e.g. the Python code that creates the StrategyRow
instances is slow (e.g. StrategyRow.objects.create()
), but still I believe the key is to batch insert them instead of running N queries
Upvotes: 1