Reputation: 743
I am building a report generator that requests certain information through an API and then parse(s)/writes the response to a CSV file.
I have all this working fine.
The trouble lies when there is an 'empty' field in the API response. This causes the columns in the CSV to 'collapse'.
Example
URL CATEGORY USER
URL USER
URL CATEGORY USER
Heres an example of the raw API response containing Two lines (The line begins with {"url"). The second is missing 'sales_tax_value' so would some how need an empty field inserting to compensate:
{
"expenses": [
{
"url": "xxxxxx",
"user": " xxxxxx ",
"project": " xxxxxx ",
"category": "xxxxxx ",
"dated_on": "xxxxxx ",
"currency": "xxxxxx ",
"gross_value": "xxxxxx",
"native_gross_value": " xxxxxx ",
"sales_tax_rate": " xxxxxx ",
"sales_tax_value": " xxxxxx ",
"native_sales_tax_value": " xxxxxx",
"description": " xxxxxx ",
"mileage": " xxxxxx ",
"engine_size_index": "xxxxxx",
"engine_type_index": "xxxxxx",
"vehicle_type": "xxxxxx",
"initial_rate_mileage": " xxxxxx ",
"have_vat_receipt": "xxxxxx",
"updated_at": " xxxxxx ",
"created_at": " xxxxxx "
},
{
"url": "xxxxxx",
"user": " xxxxxx ",
"project": " xxxxxx ",
"category": "xxxxxx ",
"dated_on": " xxxxxx ",
"currency": " xxxxxx ",
"gross_value": " xxxxxx ",
"native_gross_value": " xxxxxx ",
"sales_tax_rate": " xxxxxx ",
"description": " xxxxxx ",
"mileage": " xxxxxx ",
"engine_size_index": "xxxxxx",
"engine_type_index": "xxxxxx",
"vehicle_type": " xxxxxx ",
"initial_rate_mileage": " xxxxxx ",
"have_vat_receipt": "xxxxxx",
"updated_at": " xxxxxx ",
"created_at": " xxxxxx "
}
]
}
I have tried a long number of 'if' statements to check if each 'header' is present and if not append it to the line but this has so far been unsuccessful. This has resulted in creating some large (1.1GB) CSV files. I have put this down to poorly written loops on my part.
Any pointers would be much appreciated and if any further details are needed please let me know.
Upvotes: 1
Views: 696
Reputation: 1121962
If you use the csv.DictWriter()
class, you can pass your dictionaries directly to the writer, and it'll handle missing keys for you:
import csv
with open(outputfilename, 'wb') as outfh:
writer = csv.DictWriter(outfh, fieldnames=('url', 'category', 'user'), extrasaction='ignore')
writer.writerows(data['expenses'])
Note:
fieldnames
argument specifies which keys csv.DictWriter()
will use to write to the CSV file.extrasaction='ignore'
keyword argument tells the writer to just ignore any other keys in the dictionary.restval
argument, which defaults to the empty string. This means that if the 'category' key is missing, the column will still be added to the output file as an empty value.expenses
list in one go to writer.writerows()
.Upvotes: 1
Reputation: 142156
Declare of list of the fields you want to be output to the CSV file:
fields = ['url', 'category', 'user']
Then loop over the items in the response, and get the value or default it:
for expense in response['expenses']:
data = [expense.get(field, '') for field in fields]
# do whatever
Upvotes: 1