Dave Qi
Dave Qi

Reputation: 11

CSV to Multi level JSON structure

I have the following csv file (1.csv):

"STUB_1","current_week","previous_week","weekly_diff"
"Crude Oil",1184.951,1191.649,-6.698

Need to convert to the following json

json_body = [
    {
        "measurement":"Crude Oil",
        "fields":
            {
                "weekly_diff":-6.698,
                "current_week":1184.951,
                "previous_week":1191.649
                }
     }
]

df = pd.read_csv("1.csv")

df = df.rename(columns={'STUB_1': 'measurement'})

j = (df.groupby(['measurement'], as_index=True)
       .apply(lambda x: x[['current_week','previous_week', 'weekly_diff']].to_dict('r'))
       .reset_index()
       .rename(columns={0:'fields'})
       .to_json(orient='records'))

print j

output:

[
  {
    "measurement": "Crude Oil",
    "fields": 
    [  #extra bracket
      {
        "weekly_diff": -6.698,
        "current_week": 1184.951,
        "previous_week": 1191.649
      }
    ] # extra bracket
  }
]

which is almost what I need but with extra [ ].

can anyone help what I did wrong? thank you!

Upvotes: 1

Views: 1865

Answers (1)

zwer
zwer

Reputation: 25799

Don't use pandas for this - you would have to do a lot of manual unraveling to turn your table data into a hierarchical structure so why not just skip the middle man and use the built-in csv and json modules to do the task for you, e.g.

import csv
import json

with open("1.csv", "rU") as f:  # open your CSV file for reading
    reader = csv.DictReader(f, quoting=csv.QUOTE_NONNUMERIC)  # DictReader for convenience
    data = [{"measurement": r.pop("STUB_1", None), "fields": r} for r in reader]  # convert!
    data_json = json.dumps(data, indent=4)  # finally, serialize the data to JSON
    print(data_json)

and you get:

[
    {
        "measurement": "Crude Oil",
        "fields": {
            "current_week": 1184.951, 
            "previous_week": 1191.649, 
            "weekly_diff": -6.698
        }
    }
]

However, keep in mind that if you have multiple entries with the same STUB_1 value only the latest will be kept - otherwise you'd have to store your fields as a list which will bring you to your original problem with the data.

A quick note on how it does what it does - first we create a csv.DictReader - it's a convenience reader that will map each row's entry with the header fields. It also uses quoting=csv.QUOTE_NONNUMERIC to ensure automatic conversion to floats for all non-quoted fields in your CSV. Then, in the list comprehension, it essentially reads row by row from the reader and creates a new dict for each row - the measurement key contains the STUB_1 entry (which gets immediately removed with dict.pop()) and fields contains the remaining entries in the row. Finally, the json module is used to serialize this list into a JSON that you want.

Also, keep in mind that JSON (and Python <3.5) doesn't guarantee the order of elements so your measurement entry might appear after the fields entry and same goes for the sub-entries of fields. Order shouldn't matter anyway (except for a few very specific cases) but if you want to control it you can use collections.OrderedDict to build your inner dictionaries in the order you prefer to look at once serialized to JSON.

Upvotes: 2

Related Questions