Sal
Sal

Reputation: 1693

Python csv columns to JSON

I have the following csv

id, o1, o2, o3
'jess', 1.0, 4, 0.3
'jill', 0, 5, 0.123
'jamie', -3, 0.2, 1.0

and would like it in a nested json with each column as a json keyed on the header name:

myjson = {
    "o1": {"jess": 1.0, "jill":   0, "jamie": -3}, 
    "o2": {"jess":   4, "jill":   5, "jamie": 0.2}, 
    "o3": {"jess": 0.3, "jill": 0.2, "jamie": 1.0}, 

Not sure the best (most pythonic) way to do this. Here is my first attempt:

import csv
with open(myfile, "r") as f:
    reader = csv.reader(f, delimiter=',', quotechar='"')
    first = True
    for line in reader:
        if first:
            myjson = {key: dict() for key in line}
            header = list(line)
            first = False
        for i in range(len(header)):
            id = line[0]
            myjson[header[i+1]][id] = line[i+1]

I assume there is a better way to do this.

Edit: Should've specified this earlier but I do NOT want to use anything like Pandas. This needs to be super fast with minimal package dependencies.

Upvotes: 0

Views: 1612

Answers (3)

chfw
chfw

Reputation: 4592

Here is a simpler solution and you will need pyexcel and pyexcel-text:

>>> import pyexcel as p
>>> sheet=p.get_sheet(file_name='test.csv')
>>> sheet
test.csv:
+---------+-----+-----+-------+
| id      | o1  | o2  | o3    |
+---------+-----+-----+-------+
| 'jess'  | 1.0 | 4   | 0.3   |
+---------+-----+-----+-------+
| 'jill'  | 0   | 5   | 0.123 |
+---------+-----+-----+-------+
| 'jamie' | 3   | 0.2 | 1.0   |
+---------+-----+-----+-------+
>>> sheet.transpose()
>>> sheet.name_columns_by_row(0)
>>> sheet.name_rows_by_column(0)
>>> sheet
test.csv:
+----+--------+--------+---------+
|    | 'jess' | 'jill' | 'jamie' |
+====+========+========+=========+
| o1 | 1.0    | 0      | 3       |
+----+--------+--------+---------+
| o2 | 4      | 5      | 0.2     |
+----+--------+--------+---------+
| o3 | 0.3    | 0.123  | 1.0     |
+----+--------+--------+---------+
>>> sheet.get_json(write_title=False) # pip install pyexcel-text
'{"o1": {"\'jamie\'": 3, "\'jess\'": 1.0, "\'jill\'": 0}, "o2": {"\'jamie\'": "0.2", "\'jess\'": 4, "\'jill\'": 5}, "o3": {"\'jamie\'": 1.0, "\'jess\'": "0.3", "\'jill\'": "0.123"}}'

Upvotes: 0

Bhargav
Bhargav

Reputation: 484

I definitely think the following answer is too long, but if you are in need of answer still, this works. I have created a test.csv as per your data

I don't know why you want to eliminate using pandas, but anyways

import csv
import itertools
from itertools import *
import json


def read_with_header():
    with open ('/Users/bhargavsaidama/test.csv', 'rb') as f:
        reader = csv.reader(f, delimiter = ',', quotechar = '|')
        row_count = 0
        keys = []
        for row in reader:
            row_count = row_count + 1
            keys.append(row)
        header = keys[0]
        return row_count, header


def reading_ignore_header():

    row_count, header = read_with_header()

    with open('/Users/bhargavsaidama/test.csv', 'rb') as f:
        f.next()
        # row_count = sum(1 for row in f)
        # # print row_count
        reader = csv.reader(f, delimiter = ',' , quotechar = '|')
        result = []
        values = ()

        for row in reader:
            # row_count is taken including header file in the above function
            values = tuple((itertools.combinations(row, 2)))[:(row_count-1)]  # row_count is important,since your keys are rows

            for x, y in values:
                result.append({x:y})
        return result, header

# The following function is taken from here
# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]


def main():

    result, header = reading_ignore_header()
    final_values = list(chunks(result,3)) # here 3 reflects (row_count-1)
    header = header[1:]        # seems like u wanna ignore Id 
    data_str = json.dumps(dict(zip(header, final_values)))
    data_json = json.loads(data_str)
    print data_str, data_json
    return data_str, data_json


if __name__ == "__main__":
    main()

Hope it helps, if you can optimize it go ahead and do it. I will also learn :)

Thanks

Upvotes: 0

MattR
MattR

Reputation: 5126

This may be "cheating" but this has always worked for me. If it hasn't - nothing a little code couldn't fix. But I use the Pandas module. It really takes care of a lot of my data needs. I read the csv into a DataFrame and then Put the dataframe into JSON (or any other format)

import pandas as pd

df1 = pd.read_csv('YOUR_PATH_HERE')

df1.to_json('PATH_HERE')

It's super simple and easily customizable. You may need to fiddle around with more variables. Here is the documentation: read_csv, to_json and this is always a good one to read: 10 Minutes to Pandas

Upvotes: 3

Related Questions