python trasform data from csv to array of dictionaries and group by field value

Question

I have csv like this:

id,company_name,country,country_id
1,batstop,usa, xx
2,biorice,italy, yy
1,batstop,italy, yy
3,legstart,canada, zz

I want an array of dictionaries to import to firebase. I need to group the different country informations for the same company in a nested list of dictionaries. This is the desired output:

[ {'id':'1', 'agency_name':'batstop', countries [{'country':'usa','country_id':'xx'}, {'country':'italy','country_id':'yy'}]} ,         
 {'id':'2', 'agency_name':'biorice', countries [{'country':'italy','country_id':'yy'}]},
 {'id':'3', 'legstart':'legstart', countries [{'country':'canada','country_id':'zz'}]} ]

Kenny Aires · Accepted Answer

Recently I had a similar task, the groupby function from itertools and the itemgetter function from operator - both standard python libraries - helped me a lot. Here's the code considering your csv, note how defining the primary keys of your csv dataset is important.

import csv
import json

from operator import itemgetter
from itertools import groupby


primary_keys = ['id', 'company_name']

# Start extraction
with open('input.csv', 'r') as file:
    # Read data from csv
    reader = csv.DictReader(file)

    # Sort data accordingly to primary keys
    reader = sorted(reader, key=itemgetter(*primary_keys))

# Create a list of tuples
# Each tuple containing a dict of the group primary keys and its values, and a list of the group ordered dicts
groups = [(dict(zip(primary_keys, _[0])), list(_[1])) for _ in groupby(reader, key=itemgetter(*primary_keys))]

# Create formatted dict to be converted into firebase objects
group_dicts = []
for group in groups:
    group_dict = {
        "id": group[0]['id'],
        "agency_name": group[0]['company_name'],
        "countries": [
            dict(country=_['country'], country_id=_['country_id']) for _ in group[1]
        ],
    }
    group_dicts.append(group_dict)

print("
".join([json.dumps(_, indent=2) for _ in group_dicts]))

Here's the output:

{
  "id": "1",
  "agency_name": "batstop",
  "countries": [
    {
      "country": "usa",
      "country_id": " xx"
    },
    {
      "country": "italy",
      "country_id": " yy"
    }
  ]
}
{
  "id": "2",
  "agency_name": "biorice",
  "countries": [
    {
      "country": "italy",
      "country_id": " yy"
    }
  ]
}
{
  "id": "3",
  "agency_name": "legstart",
  "countries": [
    {
      "country": "canada",
      "country_id": " zz"
    }
  ]
}

There's no external library, Hope it suits you well!

python trasform data from csv to array of dictionaries and group by field value

Answers (2)

Related Questions