Reputation: 2158
I am trying to merge nested list of dictionary key/value into single key and list of values. I am loading csv file into data frame and from that I am trying to convert it into nested json. Please see below I have tried this. Should I be going this route to create json or does pandas have a native functionality that do this type of conversion?
Sample Data:
Subject,StudentName,Category
ENGLISH,Jane,
ENGLISH,,A
MATH,Matt,B
MATH,Newman,AA
MATH,,B
MATH,Dylan,A
ENGLISH,Noah,
ENGLISH,,C
Tried this:
df1 = pd.read_csv('../data/file.csv')
json_doc = defaultdict(list)
for _id in df1.T:
data = df1.T[_id]
key = data.Subject
values = {'StudentName': data.StudentName,'Category':data.Category}
json_doc[key].append(values)
new_d = json.dumps(json_doc, indent=4)
{k: int(v) for k, v in new_d} # error: ValueError: not enough values to unpack (expected 2, got 1)
and I get this from the code above:
{
"ENGLISH": [
{
"StudentName": "Jane",
"Category": NaN
},
{
"StudentName": NaN,
"Category": "A"
},
{
"StudentName": "Noah",
"Category": NaN
},
{
"StudentName": NaN,
"Category": "C"
}
],
"MATH": [
{
"StudentName": "Matt",
"Category": "B"
},
{
"StudentName": "Newman",
"Category": "AA"
},
{
"StudentName": NaN,
"Category": "B"
},
{
"StudentName": "Dylan",
"Category": "A"
}
]
}
How do I merge key/value to get it look like this one?
{
"ENGLISH": [
{
"StudentName": ["Jane","Noah"],
"Category": ["A","C"]
}
],
"MATH": [
{
"StudentName": ["Matt","Newman","Dylan"]
"Category": ["B","AA","A"]
}
]
}
Upvotes: 0
Views: 257
Reputation: 15738
It is not entirely clear to me if it is safe to ignore missing values, but here is my one-liner:
df.groupby('Subject').agg(lambda g: list(g.dropna())).to_dict(orient='index')
Default methods (to_json, to_dict) do not have a suitable orient
option. So, we have to do some work by hands by grouping by index and then converting column data to a list. Then, .to_dict(orient='index')
will do what you want (replace with to_json
if you want a string instead of an object).
Note: Subject
here is expected to be a column, not an index.
Upvotes: 1