Reputation: 33
I would like to convert a csv file into a json file using python 2.7. Down below is the python code I tried but it is not giving me expected result. Also, I would like to know if there is any simplified version than mine. Any help is appreciated.
zipcode,date,state,val1,val2,val3,val4,val5
95110,2015-05-01,CA,50,30.00,5.00,3.00,3
95110,2015-06-01,CA,67,31.00,5.00,3.00,4
95110,2015-07-01,CA,97,32.00,5.00,3.00,6
{
"zipcode": "95110",
"state": "CA",
"subset": [
{
"date": "2015-05-01",
"val1": "50",
"val2": "30.00",
"val3": "5.00",
"val4": "3.00",
"val5": "3"
},
{
"date": "2015-06-01",
"val1": "67",
"val2": "31.00",
"val3": "5.00",
"val4": "3.00",
"val5": "4"
},
{
"date": "2015-07-01",
"val1": "97",
"val2": "32.00",
"val3": "5.00",
"val4": "3.00",
"val5": "6"
}
]
}
import pandas as pd
from itertools import groupby
import json
df = pd.read_csv('SampleCsvFile.csv')
names = df.columns.values.tolist()
data = df.values
master_list2 = [ (d["zipcode"], d["state"], d) for d in [dict(zip(names, d)) for d in data] ]
intermediate2 = [(k, [x[2] for x in list(v)]) for k,v in groupby(master_list2, lambda t: (t[0],t[1]) )]
nested_json2 = [dict(zip(names,(k[0][0], k[0][1], k[1]))) for k in [(i[0], i[1]) for i in intermediate2]]
#print json.dumps(nested_json2, indent=4)
with open('ExpectedJsonFile.json', 'w') as outfile:
outfile.write(json.dumps(nested_json2, indent=4))
Upvotes: 3
Views: 16923
Reputation: 944
Since you are using pandas already, I tried to get as much mileage as I could out of dataframe methods. I also ended up wandering fairly far afield from your implementation. I think the key here, though, is don't try to get too clever with your list and/or dictionary comprehensions. You can very easily confuse yourself and everyone who reads your code.
import pandas as pd
from itertools import groupby
from collections import OrderedDict
import json
df = pd.read_csv('SampleCsvFile.csv', dtype={
"zipcode" : str,
"date" : str,
"state" : str,
"val1" : str,
"val2" : str,
"val3" : str,
"val4" : str,
"val5" : str
})
results = []
for (zipcode, state), bag in df.groupby(["zipcode", "state"]):
contents_df = bag.drop(["zipcode", "state"], axis=1)
subset = [OrderedDict(row) for i,row in contents_df.iterrows()]
results.append(OrderedDict([("zipcode", zipcode),
("state", state),
("subset", subset)]))
print json.dumps(results[0], indent=4)
#with open('ExpectedJsonFile.json', 'w') as outfile:
# outfile.write(json.dumps(results[0], indent=4))
The simplest way to have all the json datatypes written as strings, and to retain their original formatting, was to force read_csv
to parse them as strings. If, however, you need to do any numerical manipulation on the values before writing out the json, you will have to allow read_csv
to parse them numerically and coerce them into the proper string format before converting to json.
Upvotes: 2