Reputation: 1949
I have a standard DRF web application that outputs CSV data for one of the routes. Rendering the entire CSV representation takes a while to do. The data set is quite large so I wanted to have a streaming HTTP response so the client doesn't time out.
However using the example provided in https://github.com/mjumbewu/django-rest-framework-csv/blob/2ff49cff4b81827f3f450fd7d56827c9671c5140/rest_framework_csv/renderers.py#L197 doesn't quite accomplish this. The data is still one large payload instead of being chunked and the client ends up waiting for a response before the bytes are received.
The structure is similar to what follows:
models.py
class Report(models.Model):
count = models.PostiveIntegerField(blank=True)
...
renderers.py
class ReportCSVRenderer(CSVStreamingRenderer):
header = ['count']
serializers.py
class ReportSerializer(serializers.ModelSerializer):
count = fields.IntegerField()
class Meta:
model = Report
views.py
class ReportCSVView(generics.Viewset, mixins.ListModelMixin):
def get_queryset(self):
return Report.objects.all()
def list(self, request, *args, **kwargs):
queryset = self.get_queryset()
data = ReportSerializer(queryset, many=True)
renderer = ReportCSVRenderer()
response = StreamingHttpResponse(renderer.render(data), content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="f.csv"'
return response
NOTE: had to comment out or change some things.
Thank you
Upvotes: 12
Views: 13248
Reputation: 61
A simpler solution, inspired by the @3066d0's one:
renderers.py
class ReportsRenderer(CSVStreamingRenderer):
header = [ ... ]
labels = { ... }
views.py
class ReportCSVViewset(ListModelMixin, GenericViewSet):
queryset = Report.objects.select_related('stuff')
serializer_class = ReportCSVSerializer
renderer_classes = [ReportsRenderer]
PAGE_SIZE = 1000
def list(self, request, *args, **kwargs):
queryset = self.filter_queryset(self.get_queryset())
response = StreamingHttpResponse(
request.accepted_renderer.render(self._stream_serialized_data(queryset)),
status=200,
content_type="text/csv",
)
response["Content-Disposition"] = 'attachment; filename="reports.csv"'
return response
def _stream_serialized_data(self, queryset):
serializer = self.get_serializer_class()
paginator = Paginator(queryset, self.PAGE_SIZE)
for page in paginator.page_range:
yield from serializer(paginator.page(page).object_list, many=True).data
The point is that you need to pass a generator that yields serialized data as the data
argument to the renderer, and then the CSVStreamingRenderer
does its things and streams the response itself. I prefer this approach, because this way you do not need to override the code of a third-party library.
Upvotes: 6
Reputation: 9837
You need to provide the CSV headers (via the header
param) when rendering the data:
renderer.render(data, renderer_context={'header': ['header1', 'header2', 'header3']})
If you don't specify the header
parameter, djangorestframework-csv
will attempt to "guess" the CSV headers by itself. To "guess" the CSV headers, djangorestframework-csv
will load all your data
in memory, resulting in the delay you are experiencing.
Upvotes: 0
Reputation: 615
Django's StreamingHttpResponse
can be much slower than a traditional HttpResponse
for small responses.
Don't use it if you don't need to; the Django Docs actually recommend that StreamingHttpResponse
should only be used in when it is absolutely required that the whole content isn't iterated before transferring the data to the client."
Also for your problem you may find useful setting the chunk_size, switching to FileResponse or returning to a normal Response (if using the REST framework) or HttpResponse.
Edit 1: About setting the chunk size:
In the File api you can open the File in chunks so not all the file gets loaded in memory.
I hope you find this useful.
Upvotes: 2
Reputation: 1949
So I ended up coming to a solution I was happy with using the Paginator
class with the queryset. First, I wrote a renderer that subclassed the CSVStreamingRenderer
, then used that in my CSVViewset's Renderer.
renderers.py
from rest_framework_csv.renderers import CSVStreamingRenderer
# *****************************************************************************
# BatchedCSVRenderer
# *****************************************************************************
class BatchedCSVRenderer(CSVStreamingRenderer):
"""
a CSV renderer that works with large querysets returning a generator
function. Used with a streaming HTTP response, it provides response bytes
instead of the client waiting for a long period of time
"""
def render(self, data, renderer_context={}, *args, **kwargs):
if 'queryset' not in data:
return data
csv_buffer = Echo()
csv_writer = csv.writer(csv_buffer)
queryset = data['queryset']
serializer = data['serializer']
paginator = Paginator(queryset, 50)
# rendering the header or label field was taken from the tablize
# method in django rest framework csv
header = renderer_context.get('header', self.header)
labels = renderer_context.get('labels', self.labels)
if labels:
yield csv_writer.writerow([labels.get(x, x) for x in header])
else:
yield csv_writer.writerow(header)
for page in paginator.page_range:
serialized = serializer(
paginator.page(page).object_list, many=True
).data
# we use the tablize function on the parent class to get a
# generator that we can use to yield a row
table = self.tablize(
serialized,
header=header,
labels=labels,
)
# we want to remove the header from the tablized data so we use
# islice to take from 1 to the end of generator
for row in itertools.islice(table, 1, None):
yield csv_writer.writerow(row)
# *****************************************************************************
# ReportsRenderer
# *****************************************************************************
class ReportsRenderer(BatchedCSVRenderer):
"""
A render for returning CSV data for reports
"""
header = [ ... ]
labels = { ... }
views.py
from django.http import StreamingHttpResponse
from rest_framework import mixins, viewsets
# *****************************************************************************
# CSVViewSet
# *****************************************************************************
class CSVViewSet(
mixins.ListModelMixin,
viewsets.GenericViewSet,
):
def list(self, request, *args, **kwargs):
queryset = self.get_queryset()
return StreamingHttpResponse(
request.accepted_renderer.render({
'queryset': queryset,
'serializer': self.get_serializer_class(),
})
)
# *****************************************************************************
# ReportsViewset
# *****************************************************************************
class ReportCSVViewset(CSVViewSet):
"""
Viewset for report CSV output
"""
renderer_classes = [ReportCSVRenderer]
serializer_class = serializers.ReportCSVSerializer
def get_queryset(self):
queryset = Report.objects.filter(...)
This might seem like a lot for a streaming response, but we used the BatchedCSVRender
and CSVViewset
in a bunch of other places. If you're running your server behind nginx then it might also be useful to adjust the settings there to allow streaming responses.
Hopefully this helps anyone having the same goal. Let me know if there's any other information I can provide.
Upvotes: 0