Python Report Builder

Question

I am building a report generator that requests certain information through an API and then parse(s)/writes the response to a CSV file.

I have all this working fine.

The trouble lies when there is an 'empty' field in the API response. This causes the columns in the CSV to 'collapse'.

Example

URL  CATEGORY  USER
URL  USER
URL  CATEGORY  USER

Heres an example of the raw API response containing Two lines (The line begins with {"url"). The second is missing 'sales_tax_value' so would some how need an empty field inserting to compensate:

{
    "expenses": [
        {
            "url": "xxxxxx",
            "user": " xxxxxx ",
            "project": " xxxxxx ",
            "category": "xxxxxx ",
            "dated_on": "xxxxxx ",
            "currency": "xxxxxx ",
            "gross_value": "xxxxxx",
            "native_gross_value": " xxxxxx ",
            "sales_tax_rate": " xxxxxx ",
            "sales_tax_value": " xxxxxx ",
            "native_sales_tax_value": " xxxxxx",
            "description": " xxxxxx ",
            "mileage": " xxxxxx ",
            "engine_size_index": "xxxxxx",
            "engine_type_index": "xxxxxx",
            "vehicle_type": "xxxxxx",
            "initial_rate_mileage": " xxxxxx ",
            "have_vat_receipt": "xxxxxx",
            "updated_at": " xxxxxx ",
            "created_at": " xxxxxx "
        },
        {
            "url": "xxxxxx",
            "user": " xxxxxx ",
            "project": " xxxxxx ",
            "category": "xxxxxx ",
            "dated_on": " xxxxxx ",
            "currency": " xxxxxx ",
            "gross_value": " xxxxxx ",
            "native_gross_value": " xxxxxx ",
            "sales_tax_rate": " xxxxxx ",
            "description": " xxxxxx ",
            "mileage": " xxxxxx ",
            "engine_size_index": "xxxxxx",
            "engine_type_index": "xxxxxx",
            "vehicle_type": " xxxxxx ",
            "initial_rate_mileage": " xxxxxx ",
            "have_vat_receipt": "xxxxxx",
            "updated_at": " xxxxxx ",
            "created_at": " xxxxxx "
        }
    ]
}

I have tried a long number of 'if' statements to check if each 'header' is present and if not append it to the line but this has so far been unsuccessful. This has resulted in creating some large (1.1GB) CSV files. I have put this down to poorly written loops on my part.

Any pointers would be much appreciated and if any further details are needed please let me know.

Martijn Pieters · Accepted Answer

If you use the csv.DictWriter() class, you can pass your dictionaries directly to the writer, and it'll handle missing keys for you:

 import csv

 with open(outputfilename, 'wb') as outfh:
     writer = csv.DictWriter(outfh, fieldnames=('url', 'category', 'user'), extrasaction='ignore')
     writer.writerows(data['expenses'])

Note:

The fieldnames argument specifies which keys csv.DictWriter() will use to write to the CSV file.
The extrasaction='ignore' keyword argument tells the writer to just ignore any other keys in the dictionary.
Any missing keys in the input dictionary use the restval argument, which defaults to the empty string. This means that if the 'category' key is missing, the column will still be added to the output file as an empty value.
You can just pass in the whole expenses list in one go to writer.writerows().

Python Report Builder

Answers (2)

Related Questions