Reputation: 401
I am using Django's StreamingHttpResponse to stream a large CSV file on the fly. According to the docs, an iterator is passed to the response's streaming_content
parameter:
import csv
from django.http import StreamingHttpResponse
def get_headers():
return ['field1', 'field2', 'field3']
def get_data(item):
return {
'field1': item.field1,
'field2': item.field2,
'field3': item.field3,
}
# StreamingHttpResponse requires a File-like class that has a 'write' method
class Echo(object):
def write(self, value):
return value
def get_response(queryset):
writer = csv.DictWriter(Echo(), fieldnames=get_headers())
writer.writeheader() # this line does not work
response = StreamingHttpResponse(
# the iterator
streaming_content=(writer.writerow(get_data(item)) for item in queryset),
content_type='text/csv',
)
response['Content-Disposition'] = 'attachment;filename=items.csv'
return response
My question is: how can I manually write a row on the CSV writer? manually calling writer.writerow(data) or writer.writeheader() (which also internally calls writerow()) does not seem to write to the dataset, and instead only the generated / streamed data from streaming_content is written on the output dataset.
Upvotes: 8
Views: 4984
Reputation: 3890
you can chain generator using itertools in python to add header row to the queryset row
here is how you do it:
import itertools
def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
headers = [["title 1", "title 2"], ]
row_titles = (header for header in headers) # title generator
items = Item.objects.all()
rows = (["Row {}".format(item.pk), str(item.pk)] for item in items)
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
rows = itertools.chain(row_titles, rows) # merge 2 generators
return StreamingHttpResponse(
(writer.writerow(row) for row in rows),
content_type="text/csv",
headers={'Content-Disposition': 'attachment; filename="somefilename.csv"'},
)
and you will get csv with the title and the queryset:
title 1, title 2
1, 1
2, 2
...
Upvotes: 1
Reputation: 1642
The proposed solution can actually lead to incorrect/mismatched CSVs (header mismatched with data). You'd want to replace the affected section with something like:
header = dict(zip(fieldnames, fieldnames))
yield writer.writerow(header)
instead. This is from the implementation of writeheader
https://github.com/python/cpython/blob/08045391a7aa87d4fbd3e8ef4c852c2fa4e81a8a/Lib/csv.py#L141:L143
For some reason, it's not behaving well with yield
Hope this helps someone in the future :)
Also note that no fix is needed if using python 3.8+ because of this PR: https://bugs.python.org/issue27497
Upvotes: 4
Reputation: 401
The answer is yielding results with a generator function instead of calculating them on the fly (within StreamingHttpResponse's streaming_content
argument) and using the pseudo buffer we created (Echo Class) in order to write a row to the response:
import csv
from django.http import StreamingHttpResponse
def get_headers():
return ['field1', 'field2', 'field3']
def get_data(item):
return {
'field1': item.field1,
'field2': item.field2,
'field3': item.field3,
}
# StreamingHttpResponse requires a File-like class that has a 'write' method
class Echo(object):
def write(self, value):
return value
def iter_items(items, pseudo_buffer):
writer = csv.DictWriter(pseudo_buffer, fieldnames=get_headers())
yield pseudo_buffer.write(get_headers())
for item in items:
yield writer.writerow(get_data(item))
def get_response(queryset):
response = StreamingHttpResponse(
streaming_content=(iter_items(queryset, Echo())),
content_type='text/csv',
)
response['Content-Disposition'] = 'attachment;filename=items.csv'
return response
Upvotes: 16