Reputation: 2248
I have a model in django called "Sample" I want to query and return a large number of rows ~ 100k based on filters. However, it's taking up to 4-5 seconds to return the response and I was wondering whether I could make it faster.
(Need to improve converting from queryset to df to response json. Not querying from DB)
My current code looks like this:
@api_view(['POST'])
def retrieve_signal_asset_weight_ts_by_signal(request):
#code to get item.id here based on request
qs = Sample.objects.filter(
data_date__range=[start_date, end_date],
item__id = item.id).values(*columns_required)
df = pd.DataFrame(list(qs), columns=columns_required)
response = df .to_json(orient='records')
return Response(response, status=status.HTTP_200_OK)
Based on multiple test cases -- I've noticed that the slow part isn't actually getting the data from DB, it's converting it to a DataFrame and then returning as JSON.
It's actually taking about 2 seconds just for this part df = pd.DataFrame(list(qs), columns=columns_required)
. Im looking for a faster way to convert queryset to a json which I can send as part of my "response" object!
Based on this link I've tried other methods including django-pandas
and using .values_list()
but they seem to be slower than this, and I noticed many of the answers are quite old so I was wondering whether Django 3 has anything to make it faster.
Thanks
Django version : 3.2.6
Upvotes: 0
Views: 1831
Reputation: 120409
With your code, you can't write:
(Need to improve converting from queryset to df to response json. Not querying from DB)
It's actually taking about 2 seconds just for this part
df = pd.DataFrame(list(qs), columns=columns_required)
Get data from database is a lazy operation, so the query will be executed only when data is needed list(qs)
. According to the documentation:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve any database activity. You can stack filters together all day long, and Django won’t actually run the query until the QuerySet is evaluated. Take a look at this example:
Try to separate operation:
records = list(qs)
df = pd.DataFrame(records, columns=columns_required))
Now, you can determine which operation is time-consuming.
Maybe, you look at StreamingHttpResponse
Upvotes: 2