Reputation: 11
I have the following csv file (1.csv):
"STUB_1","current_week","previous_week","weekly_diff" "Crude Oil",1184.951,1191.649,-6.698
Need to convert to the following json
json_body = [
{
"measurement":"Crude Oil",
"fields":
{
"weekly_diff":-6.698,
"current_week":1184.951,
"previous_week":1191.649
}
}
]
df = pd.read_csv("1.csv")
df = df.rename(columns={'STUB_1': 'measurement'})
j = (df.groupby(['measurement'], as_index=True)
.apply(lambda x: x[['current_week','previous_week', 'weekly_diff']].to_dict('r'))
.reset_index()
.rename(columns={0:'fields'})
.to_json(orient='records'))
print j
output:
[ { "measurement": "Crude Oil", "fields": [ #extra bracket { "weekly_diff": -6.698, "current_week": 1184.951, "previous_week": 1191.649 } ] # extra bracket } ]
which is almost what I need but with extra [ ].
can anyone help what I did wrong? thank you!
Upvotes: 1
Views: 1865
Reputation: 25799
Don't use pandas
for this - you would have to do a lot of manual unraveling to turn your table data into a hierarchical structure so why not just skip the middle man and use the built-in csv
and json
modules to do the task for you, e.g.
import csv
import json
with open("1.csv", "rU") as f: # open your CSV file for reading
reader = csv.DictReader(f, quoting=csv.QUOTE_NONNUMERIC) # DictReader for convenience
data = [{"measurement": r.pop("STUB_1", None), "fields": r} for r in reader] # convert!
data_json = json.dumps(data, indent=4) # finally, serialize the data to JSON
print(data_json)
and you get:
[
{
"measurement": "Crude Oil",
"fields": {
"current_week": 1184.951,
"previous_week": 1191.649,
"weekly_diff": -6.698
}
}
]
However, keep in mind that if you have multiple entries with the same STUB_1
value only the latest will be kept - otherwise you'd have to store your fields as a list which will bring you to your original problem with the data.
A quick note on how it does what it does - first we create a csv.DictReader
- it's a convenience reader that will map each row's entry with the header fields. It also uses quoting=csv.QUOTE_NONNUMERIC
to ensure automatic conversion to floats for all non-quoted fields in your CSV. Then, in the list comprehension, it essentially reads row by row from the reader and creates a new dict
for each row - the measurement
key contains the STUB_1
entry (which gets immediately removed with dict.pop()
) and fields
contains the remaining entries in the row. Finally, the json
module is used to serialize this list into a JSON that you want.
Also, keep in mind that JSON (and Python <3.5) doesn't guarantee the order of elements so your measurement
entry might appear after the fields
entry and same goes for the sub-entries of fields
. Order shouldn't matter anyway (except for a few very specific cases) but if you want to control it you can use collections.OrderedDict
to build your inner dictionaries in the order you prefer to look at once serialized to JSON.
Upvotes: 2