franco_b
franco_b

Reputation: 878

python trasform data from csv to array of dictionaries and group by field value

I have csv like this:

id,company_name,country,country_id
1,batstop,usa, xx
2,biorice,italy, yy
1,batstop,italy, yy
3,legstart,canada, zz

I want an array of dictionaries to import to firebase. I need to group the different country informations for the same company in a nested list of dictionaries. This is the desired output:

[ {'id':'1', 'agency_name':'batstop', countries [{'country':'usa','country_id':'xx'}, {'country':'italy','country_id':'yy'}]} ,         
 {'id':'2', 'agency_name':'biorice', countries [{'country':'italy','country_id':'yy'}]},
 {'id':'3', 'legstart':'legstart', countries [{'country':'canada','country_id':'zz'}]} ]

Upvotes: 0

Views: 71

Answers (2)

Kenny Aires
Kenny Aires

Reputation: 1438

Recently I had a similar task, the groupby function from itertools and the itemgetter function from operator - both standard python libraries - helped me a lot. Here's the code considering your csv, note how defining the primary keys of your csv dataset is important.

import csv
import json

from operator import itemgetter
from itertools import groupby


primary_keys = ['id', 'company_name']

# Start extraction
with open('input.csv', 'r') as file:
    # Read data from csv
    reader = csv.DictReader(file)

    # Sort data accordingly to primary keys
    reader = sorted(reader, key=itemgetter(*primary_keys))

# Create a list of tuples
# Each tuple containing a dict of the group primary keys and its values, and a list of the group ordered dicts
groups = [(dict(zip(primary_keys, _[0])), list(_[1])) for _ in groupby(reader, key=itemgetter(*primary_keys))]

# Create formatted dict to be converted into firebase objects
group_dicts = []
for group in groups:
    group_dict = {
        "id": group[0]['id'],
        "agency_name": group[0]['company_name'],
        "countries": [
            dict(country=_['country'], country_id=_['country_id']) for _ in group[1]
        ],
    }
    group_dicts.append(group_dict)

print("\n".join([json.dumps(_, indent=2) for _ in group_dicts]))

Here's the output:

{
  "id": "1",
  "agency_name": "batstop",
  "countries": [
    {
      "country": "usa",
      "country_id": " xx"
    },
    {
      "country": "italy",
      "country_id": " yy"
    }
  ]
}
{
  "id": "2",
  "agency_name": "biorice",
  "countries": [
    {
      "country": "italy",
      "country_id": " yy"
    }
  ]
}
{
  "id": "3",
  "agency_name": "legstart",
  "countries": [
    {
      "country": "canada",
      "country_id": " zz"
    }
  ]
}

There's no external library, Hope it suits you well!

Upvotes: 1

Swan Toma
Swan Toma

Reputation: 146

You can try this, you may have to change a few parts to get it working with your csv, but hope it's enough to get you started:

csv = [
    "1,batstop,usa, xx",
    "2,biorice,italy, yy",
    "1,batstop,italy, yy",
    "3,legstart,canada, zz"
]

output = {} # dictionary useful to avoid searching in list for existing ids

# Parse each row
for line in csv:

    cols = line.split(',')
    id = int(cols[0])
    agency_name = cols[1]
    country = cols[2]
    country_id = cols[3]

    if id in output:
            output[id]['countries'].append([{'country': country,
                                                'country_id': country_id}])
    else:
        output[id] = {'id': id,
                       'agency_name': agency_name,
                       'countries': [{'country': country,
                                      'country_id': country_id}]
                       }


# Put into list
json_output = []
for key in output.keys():
    json_output.append( output[key] )


# Check output
for row in json_output:
    print(row)

Upvotes: 0

Related Questions