Reputation: 17
Any help is really appreciated,
I have the below JSON, provided by API call. I've omitted sensitive data, but the key names are as presented ("value", "value_raw").
[{
"Position": "1234",
"StartDate": "2020-11-21",
"ID": "1234",
"CloseDate": "2020-12-07",
"Title": "This is a title",
"data": [{
"value": 1234
},
{
"value": "some text"
},
{
"value": "some text",
"value_raw": 11111
},
{
"value_raw": 11111,
"value": "some text"
},
{
"value": "null"
},
{
"value": "some text",
"value_raw": 22222
},
{
"value_raw": 2222222,
"value": "some text"
},
{
"value_raw": "null",
"value": "null"
},
{
"value_raw": "null",
"value": "null"
},
{
"value_raw": 2222222,
"value": "some text"
},
{
"value": "null"
},
{
"value": "some text",
"value_raw": 2222
},
{
"value": 1
},
{
"value": "some text",
"value_raw": 22222
}
]
}, {
"Position": "1235",
"StartDate": "2020-12-21",
"ID": "1235",
"CloseDate": "2021-01-12",
"Title": "some text",
"data": [{
"value": 1235
},
{
"value": "some text"
},
{
"value": "some text",
"value_raw": 1111
},
{
"value": "some text",
"value_raw": 1111
},
{
"value": "null"
},
{
"value_raw": 1111,
"value": "some text"
},
{
"value_raw": 11111,
"value": "some text"
},
{
"value_raw": "null",
"value": "null"
},
{
"value": "some text",
"value_raw": 1111
},
{
"value_raw": "null",
"value": "null"
},
{
"value": "null"
},
{
"value": "some text",
"value_raw": 22222
},
{
"value": 1
},
{
"value_raw": 22222,
"value": "some text"
}
]
}, {
"ID": "1236",
"Position": "1236",
"StartDate": "2021-07-12",
"data": [{
"value": 1236
},
{
"value": "some text"
},
{
"value_raw": 1111,
"value": "some text"
},
{
"value": "some text",
"value_raw": 1111
},
{
"value": "null"
},
{
"value_raw": 1111,
"value": "some text"
},
{
"value_raw": 1111,
"value": "some text"
},
{
"value_raw": "null",
"value": "null"
},
{
"value": "null",
"value_raw": "null"
},
{
"value": "some text",
"value_raw": 111
},
{
"value": "null"
},
{
"value": "some text",
"value_raw": 12223
},
{
"value": 1
},
{
"value": "some text",
"value_raw": 2222
}
],
"Title": "some text",
"CloseDate": "2021-07-23"
}
]
When I normalize "data" using;
df = pd.json_normalize(mydata, record_path=['data'])
I end up with an output of 2 columns x 42 rows (excl. headings), illustration:
value | value_raw |
---|---|
1234 | |
This is a title | |
some text | 11111 |
Corporation | 11111 |
null | |
some text | 22222 |
some text | 2222222 |
null | null |
null | null |
The only data im interested in is the key "value", I'd also like to know how to lay this data out as 3 rows x 14 columns (one row for each ID = '1234', '1235' & '1236' and no column headings needed as they provide zero benefit with the naming convention "value")
Any starting point would be great, I have spent hours looking at previous questions. What I have noticed is that the JSON I receive is very different to all of the examples out there.
Thanks everyone
Upvotes: 1
Views: 514
Reputation: 294
Nice question. I think it's not possible with json_normalize. Therefore did it with a loop and list comprehension:
values_list_all_rows = []
for json_element in json_list:
values_list_per_row = [value_dict["value"] for value_dict in json_element["data"] if "value" in value_dict]
values_list_all_rows.append(values_list_per_row)
pd.DataFrame(values_list_all_rows)
Gives (it will set None as value where rows has less values than other column):
0 1 2 3 4 5 6 7 8 9 10 11 12 13
1234 some text some text some text null some text some text null null some text null some text 1 some text
1235 some text some text some text null some text some text null some text null null some text 1 some text
1236 some text some text some text null some text some text null null some text null some text 1 None
Upvotes: 2