Convert Csv to JSON with nested array

Question

I have a CSV file

group, first, last
fans, John, Smith
fans, Alice, White
students, Ben, Smith
students, Joan, Carpenter
...

The Output JSON file needs this format:

[
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
},
{
  "group" : "students",
  "user" : [
    {
      "first" : "Ben",
      "last" :  "Smith"
    },
    {
      "first" : "Joan",
      "last" :  "Carpenter"
    }
  ]
}
]

jschnurr · Accepted Answer

Short answer
Use itertools.groupby, as described in the documentation.

Long answer
This is a multi-step process.

Start by getting your CSV into a list of dict:

from csv import DictReader
with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

groupby needs sorted data, so define a function to get the key, and pass it in like so:

def keyfunc(x):
    return x['group']

data = sorted(data, key=keyfunc)

Last, call groupby, providing your sorted data and your key function:

from itertools import groupby
groups = []
for k, g in groupby(data, keyfunc):
    groups.append({
        "group": k,
        "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
    })

This will iterate over your data, and every time the key changes, it drops into the for block and executes that code, providing k (the key for that group) and g (the dict objects that belong to it). Here we just store those in a list for later.

In this example, the user key uses some pretty dense comprehensions to remove the group key from every row of user. If you can live with that little bit of extra data, that whole line can be simplified as:

"user": list(g)

The result looks like this:

[
  {
    "group": "fans",
    "user": [
      {
        "first": "John",
        "last": "Smith"
      },
      {
        "first": "Alice",
        "last": "White"
      }
    ]
  },
  {
    "group": "students",
    "user": [
      {
        "first": "Ben",
        "last": "Smith"
      },
      {
        "first": "Joan",
        "last": "Carpenter"
      }
    ]
  }
]

Convert Csv to JSON with nested array

Answers (1)

Related Questions