Karl
Karl

Reputation: 743

Python Report Builder

I am building a report generator that requests certain information through an API and then parse(s)/writes the response to a CSV file.

I have all this working fine.

The trouble lies when there is an 'empty' field in the API response. This causes the columns in the CSV to 'collapse'.

Example

URL  CATEGORY  USER
URL  USER
URL  CATEGORY  USER  

Heres an example of the raw API response containing Two lines (The line begins with {"url"). The second is missing 'sales_tax_value' so would some how need an empty field inserting to compensate:

{
    "expenses": [
        {
            "url": "xxxxxx",
            "user": " xxxxxx ",
            "project": " xxxxxx ",
            "category": "xxxxxx ",
            "dated_on": "xxxxxx ",
            "currency": "xxxxxx ",
            "gross_value": "xxxxxx",
            "native_gross_value": " xxxxxx ",
            "sales_tax_rate": " xxxxxx ",
            "sales_tax_value": " xxxxxx ",
            "native_sales_tax_value": " xxxxxx",
            "description": " xxxxxx ",
            "mileage": " xxxxxx ",
            "engine_size_index": "xxxxxx",
            "engine_type_index": "xxxxxx",
            "vehicle_type": "xxxxxx",
            "initial_rate_mileage": " xxxxxx ",
            "have_vat_receipt": "xxxxxx",
            "updated_at": " xxxxxx ",
            "created_at": " xxxxxx "
        },
        {
            "url": "xxxxxx",
            "user": " xxxxxx ",
            "project": " xxxxxx ",
            "category": "xxxxxx ",
            "dated_on": " xxxxxx ",
            "currency": " xxxxxx ",
            "gross_value": " xxxxxx ",
            "native_gross_value": " xxxxxx ",
            "sales_tax_rate": " xxxxxx ",
            "description": " xxxxxx ",
            "mileage": " xxxxxx ",
            "engine_size_index": "xxxxxx",
            "engine_type_index": "xxxxxx",
            "vehicle_type": " xxxxxx ",
            "initial_rate_mileage": " xxxxxx ",
            "have_vat_receipt": "xxxxxx",
            "updated_at": " xxxxxx ",
            "created_at": " xxxxxx "
        }
    ]
}

I have tried a long number of 'if' statements to check if each 'header' is present and if not append it to the line but this has so far been unsuccessful. This has resulted in creating some large (1.1GB) CSV files. I have put this down to poorly written loops on my part.

Any pointers would be much appreciated and if any further details are needed please let me know.

Upvotes: 1

Views: 696

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1121962

If you use the csv.DictWriter() class, you can pass your dictionaries directly to the writer, and it'll handle missing keys for you:

 import csv

 with open(outputfilename, 'wb') as outfh:
     writer = csv.DictWriter(outfh, fieldnames=('url', 'category', 'user'), extrasaction='ignore')
     writer.writerows(data['expenses'])

Note:

  1. The fieldnames argument specifies which keys csv.DictWriter() will use to write to the CSV file.
  2. The extrasaction='ignore' keyword argument tells the writer to just ignore any other keys in the dictionary.
  3. Any missing keys in the input dictionary use the restval argument, which defaults to the empty string. This means that if the 'category' key is missing, the column will still be added to the output file as an empty value.
  4. You can just pass in the whole expenses list in one go to writer.writerows().

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142156

Declare of list of the fields you want to be output to the CSV file:

fields = ['url', 'category', 'user']

Then loop over the items in the response, and get the value or default it:

for expense in response['expenses']:
    data = [expense.get(field, '') for field in fields]
    # do whatever

Upvotes: 1

Related Questions