Reputation: 1693
I have the following csv
id, o1, o2, o3
'jess', 1.0, 4, 0.3
'jill', 0, 5, 0.123
'jamie', -3, 0.2, 1.0
and would like it in a nested json with each column as a json keyed on the header name:
myjson = {
"o1": {"jess": 1.0, "jill": 0, "jamie": -3},
"o2": {"jess": 4, "jill": 5, "jamie": 0.2},
"o3": {"jess": 0.3, "jill": 0.2, "jamie": 1.0},
Not sure the best (most pythonic) way to do this. Here is my first attempt:
import csv
with open(myfile, "r") as f:
reader = csv.reader(f, delimiter=',', quotechar='"')
first = True
for line in reader:
if first:
myjson = {key: dict() for key in line}
header = list(line)
first = False
for i in range(len(header)):
id = line[0]
myjson[header[i+1]][id] = line[i+1]
I assume there is a better way to do this.
Edit: Should've specified this earlier but I do NOT want to use anything like Pandas. This needs to be super fast with minimal package dependencies.
Upvotes: 0
Views: 1612
Reputation: 4592
Here is a simpler solution and you will need pyexcel and pyexcel-text:
>>> import pyexcel as p
>>> sheet=p.get_sheet(file_name='test.csv')
>>> sheet
test.csv:
+---------+-----+-----+-------+
| id | o1 | o2 | o3 |
+---------+-----+-----+-------+
| 'jess' | 1.0 | 4 | 0.3 |
+---------+-----+-----+-------+
| 'jill' | 0 | 5 | 0.123 |
+---------+-----+-----+-------+
| 'jamie' | 3 | 0.2 | 1.0 |
+---------+-----+-----+-------+
>>> sheet.transpose()
>>> sheet.name_columns_by_row(0)
>>> sheet.name_rows_by_column(0)
>>> sheet
test.csv:
+----+--------+--------+---------+
| | 'jess' | 'jill' | 'jamie' |
+====+========+========+=========+
| o1 | 1.0 | 0 | 3 |
+----+--------+--------+---------+
| o2 | 4 | 5 | 0.2 |
+----+--------+--------+---------+
| o3 | 0.3 | 0.123 | 1.0 |
+----+--------+--------+---------+
>>> sheet.get_json(write_title=False) # pip install pyexcel-text
'{"o1": {"\'jamie\'": 3, "\'jess\'": 1.0, "\'jill\'": 0}, "o2": {"\'jamie\'": "0.2", "\'jess\'": 4, "\'jill\'": 5}, "o3": {"\'jamie\'": 1.0, "\'jess\'": "0.3", "\'jill\'": "0.123"}}'
Upvotes: 0
Reputation: 484
I definitely think the following answer is too long, but if you are in need of answer still, this works. I have created a test.csv as per your data
I don't know why you want to eliminate using pandas, but anyways
import csv
import itertools
from itertools import *
import json
def read_with_header():
with open ('/Users/bhargavsaidama/test.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',', quotechar = '|')
row_count = 0
keys = []
for row in reader:
row_count = row_count + 1
keys.append(row)
header = keys[0]
return row_count, header
def reading_ignore_header():
row_count, header = read_with_header()
with open('/Users/bhargavsaidama/test.csv', 'rb') as f:
f.next()
# row_count = sum(1 for row in f)
# # print row_count
reader = csv.reader(f, delimiter = ',' , quotechar = '|')
result = []
values = ()
for row in reader:
# row_count is taken including header file in the above function
values = tuple((itertools.combinations(row, 2)))[:(row_count-1)] # row_count is important,since your keys are rows
for x, y in values:
result.append({x:y})
return result, header
# The following function is taken from here
# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
def main():
result, header = reading_ignore_header()
final_values = list(chunks(result,3)) # here 3 reflects (row_count-1)
header = header[1:] # seems like u wanna ignore Id
data_str = json.dumps(dict(zip(header, final_values)))
data_json = json.loads(data_str)
print data_str, data_json
return data_str, data_json
if __name__ == "__main__":
main()
Hope it helps, if you can optimize it go ahead and do it. I will also learn :)
Thanks
Upvotes: 0
Reputation: 5126
This may be "cheating" but this has always worked for me. If it hasn't - nothing a little code couldn't fix. But I use the Pandas
module. It really takes care of a lot of my data needs. I read the csv into a DataFrame and then Put the dataframe into JSON (or any other format)
import pandas as pd
df1 = pd.read_csv('YOUR_PATH_HERE')
df1.to_json('PATH_HERE')
It's super simple and easily customizable. You may need to fiddle around with more variables. Here is the documentation: read_csv, to_json and this is always a good one to read: 10 Minutes to Pandas
Upvotes: 3