Reputation: 25
I have a CSV file
group, first, last
fans, John, Smith
fans, Alice, White
students, Ben, Smith
students, Joan, Carpenter
...
The Output JSON file needs this format:
[
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
},
{
"group" : "students",
"user" : [
{
"first" : "Ben",
"last" : "Smith"
},
{
"first" : "Joan",
"last" : "Carpenter"
}
]
}
]
Upvotes: 0
Views: 2619
Reputation: 1191
Short answer
Use itertools.groupby
, as described in the documentation.
Long answer
This is a multi-step process.
Start by getting your CSV into a list
of dict
:
from csv import DictReader
with open('data.csv') as csvfile:
r = DictReader(csvfile, skipinitialspace=True)
data = [dict(d) for d in r]
groupby
needs sorted data, so define a function to get the key, and pass it in like so:
def keyfunc(x):
return x['group']
data = sorted(data, key=keyfunc)
Last, call groupby
, providing your sorted data and your key function:
from itertools import groupby
groups = []
for k, g in groupby(data, keyfunc):
groups.append({
"group": k,
"user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
})
This will iterate over your data, and every time the key changes, it drops into the for
block and executes that code, providing k
(the key for that group) and g
(the dict
objects that belong to it). Here we just store those in a list for later.
In this example, the user
key uses some pretty dense comprehensions to remove the group
key from every row of user
. If you can live with that little bit of extra data, that whole line can be simplified as:
"user": list(g)
The result looks like this:
[
{
"group": "fans",
"user": [
{
"first": "John",
"last": "Smith"
},
{
"first": "Alice",
"last": "White"
}
]
},
{
"group": "students",
"user": [
{
"first": "Ben",
"last": "Smith"
},
{
"first": "Joan",
"last": "Carpenter"
}
]
}
]
Upvotes: 1