pythondjangorestcsvdjango-rest-framework

Reputation: 1949

Using StreamingHttpResponse with Django Rest Framework CSV

I have a standard DRF web application that outputs CSV data for one of the routes. Rendering the entire CSV representation takes a while to do. The data set is quite large so I wanted to have a streaming HTTP response so the client doesn't time out.

However using the example provided in https://github.com/mjumbewu/django-rest-framework-csv/blob/2ff49cff4b81827f3f450fd7d56827c9671c5140/rest_framework_csv/renderers.py#L197 doesn't quite accomplish this. The data is still one large payload instead of being chunked and the client ends up waiting for a response before the bytes are received.

The structure is similar to what follows:

models.py

class Report(models.Model):
  count = models.PostiveIntegerField(blank=True)
  ...

renderers.py

class ReportCSVRenderer(CSVStreamingRenderer):
  header = ['count']

serializers.py

class ReportSerializer(serializers.ModelSerializer):
  count = fields.IntegerField()

  class Meta:
    model = Report

views.py

class ReportCSVView(generics.Viewset, mixins.ListModelMixin):
  def get_queryset(self):
    return Report.objects.all()

  def list(self, request, *args, **kwargs):
    queryset = self.get_queryset()
    data = ReportSerializer(queryset, many=True)
    renderer = ReportCSVRenderer()

    response = StreamingHttpResponse(renderer.render(data), content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename="f.csv"'

    return response

NOTE: had to comment out or change some things.

Thank you

Upvotes: 12

Answers (4)

Andrii Vityk

Reputation: 61

A simpler solution, inspired by the @3066d0's one:

renderers.py

class ReportsRenderer(CSVStreamingRenderer):
    header = [ ... ]
    labels = { ... }

views.py

class ReportCSVViewset(ListModelMixin, GenericViewSet):
    queryset = Report.objects.select_related('stuff')
    serializer_class = ReportCSVSerializer
    renderer_classes = [ReportsRenderer]
    PAGE_SIZE = 1000

    def list(self, request, *args, **kwargs):
        queryset = self.filter_queryset(self.get_queryset())
        response = StreamingHttpResponse(
            request.accepted_renderer.render(self._stream_serialized_data(queryset)),
            status=200,
            content_type="text/csv",
        )
        response["Content-Disposition"] = 'attachment; filename="reports.csv"'
        return response

    def _stream_serialized_data(self, queryset):
        serializer = self.get_serializer_class()
        paginator = Paginator(queryset, self.PAGE_SIZE)
        for page in paginator.page_range:
            yield from serializer(paginator.page(page).object_list, many=True).data

The point is that you need to pass a generator that yields serialized data as the data argument to the renderer, and then the CSVStreamingRenderer does its things and streams the response itself. I prefer this approach, because this way you do not need to override the code of a third-party library.

Upvotes: 6

spg

Reputation: 9837

You need to provide the CSV headers (via the header param) when rendering the data:

renderer.render(data, renderer_context={'header': ['header1', 'header2', 'header3']})

If you don't specify the header parameter, djangorestframework-csv will attempt to "guess" the CSV headers by itself. To "guess" the CSV headers, djangorestframework-csv will load all your data in memory, resulting in the delay you are experiencing.

Upvotes: 0

ascoder

Reputation: 615

Django's StreamingHttpResponse can be much slower than a traditional HttpResponse for small responses.

Don't use it if you don't need to; the Django Docs actually recommend that StreamingHttpResponse should only be used in when it is absolutely required that the whole content isn't iterated before transferring the data to the client."

Also for your problem you may find useful setting the chunk_size, switching to FileResponse or returning to a normal Response (if using the REST framework) or HttpResponse.

Edit 1: About setting the chunk size:

In the File api you can open the File in chunks so not all the file gets loaded in memory.

I hope you find this useful.

Upvotes: 2

blackeyebeefhorsefly

Reputation: 1949

So I ended up coming to a solution I was happy with using the Paginator class with the queryset. First, I wrote a renderer that subclassed the CSVStreamingRenderer, then used that in my CSVViewset's Renderer.

renderers.py

from rest_framework_csv.renderers import CSVStreamingRenderer

# *****************************************************************************
# BatchedCSVRenderer
# *****************************************************************************


class BatchedCSVRenderer(CSVStreamingRenderer):

    """
    a CSV renderer that works with large querysets returning a generator
    function. Used with a streaming HTTP response, it provides response bytes
    instead of the client waiting for a long period of time
    """

    def render(self, data, renderer_context={}, *args, **kwargs):
        if 'queryset' not in data:
            return data

        csv_buffer = Echo()
        csv_writer = csv.writer(csv_buffer)

        queryset = data['queryset']
        serializer = data['serializer']

        paginator = Paginator(queryset, 50)

        #  rendering the header or label field was taken from the tablize
        #  method in django rest framework csv

        header = renderer_context.get('header', self.header)
        labels = renderer_context.get('labels', self.labels)

        if labels:
            yield csv_writer.writerow([labels.get(x, x) for x in header])
        else:
            yield csv_writer.writerow(header)

        for page in paginator.page_range:
            serialized = serializer(
                paginator.page(page).object_list, many=True
            ).data

            #  we use the tablize function on the parent class to get a
            #  generator that we can use to yield a row

            table = self.tablize(
                serialized,
                header=header,
                labels=labels,
            )

            #  we want to remove the header from the tablized data so we use
            #  islice to take from 1 to the end of generator

            for row in itertools.islice(table, 1, None):
                yield csv_writer.writerow(row)

# *****************************************************************************
# ReportsRenderer
# *****************************************************************************


class ReportsRenderer(BatchedCSVRenderer):

    """
    A render for returning CSV data for reports

    """

    header = [ ... ]
    labels = { ... }

views.py

from django.http import StreamingHttpResponse
from rest_framework import mixins, viewsets

# *****************************************************************************
# CSVViewSet
# *****************************************************************************


class CSVViewSet(
        mixins.ListModelMixin,
        viewsets.GenericViewSet,
):

    def list(self, request, *args, **kwargs):
        queryset = self.get_queryset()

        return StreamingHttpResponse(
            request.accepted_renderer.render({
                'queryset': queryset,
                'serializer': self.get_serializer_class(),
            })
)

# *****************************************************************************
# ReportsViewset
# *****************************************************************************


class ReportCSVViewset(CSVViewSet):

    """
    Viewset for report CSV output

    """

    renderer_classes = [ReportCSVRenderer]
    serializer_class = serializers.ReportCSVSerializer

    def get_queryset(self):
        queryset = Report.objects.filter(...)

This might seem like a lot for a streaming response, but we used the BatchedCSVRender and CSVViewset in a bunch of other places. If you're running your server behind nginx then it might also be useful to adjust the settings there to allow streaming responses.

Hopefully this helps anyone having the same goal. Let me know if there's any other information I can provide.

Upvotes: 0

Using StreamingHttpResponse with Django Rest Framework CSV

Answers (4)

Related Questions