Reputation: 137
I have a json
coming back from an API that I am appending to a list. After I finish making that call I need to flatten that data using pandas. I'm not sure how to do.
Code:
api_results = []
response = requests.post(target_url, data=doc, headers=login_details)
response_data = json.loads(response.text)
if type(response_data)==dict and 'error' in response_data.keys():
error_results.append(response_data)
else:
api_results.append(response_data)
When I call api_results
I my data looks like so:
[{"requesturl":"http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4","clientid":"123456789","adjustedsummaryguidelines":{"midrangeallabsence":46,"midrangeclaims":36,"atriskallabsence":374,"atriskclaims":98},"riskassessment":{"score":87.95,"status":"Red (Extreme)","magnitude":"86.65","volatility":"89.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"alert":"yellow","days":185},"cp100":{"alert":"yellow","days":365}},"icdcodes":[{"code":"719.41","name":"Pain in joint, shoulder region","meandurationdays":{"bp":18,"cp95":72,"cp100":93}},{"code":"840.9","name":"Sprains and strains of unspecified site of shoulder and upper arm","meandurationdays":{"bp":10,"cp95":27,"cp100":35}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}
,{"clientid":"987654321","adjustedsummaryguidelines":{"midrangeallabsence":25,"midrangeclaims":42,"atriskallabsence":0,"atriskclaims":194},"riskassessment":{"score":76.85,"status":"Orange (High)","magnitude":"74.44","volatility":"79.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"days":95},"cp100":{"alert":"yellow","days":193}},"icdcodes":[{"code":"724.2","name":"Lumbago","meandurationdays":{"bp":10,"cp95":38,"cp100":50}},{"code":"847.2","name":"Sprain of lumbar","meandurationdays":{"bp":10,"cp95":22,"cp100":29}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}]
I've been using json_normalize
but I know I'm not using this library correctly.
How do I flatten this data?
What I need is this:
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
| clientid|days| alert|days| alert|days|atriskallabsence|atriskclaims|midrangeallabsence|midrangeclaims| alertdesc|alert|applied|magnitude|score| status|volatility| code| bp|cp100|cp95|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
|123456789| 2|yellow| 365|yellow| 185| 374| 98| 46| 36|[Recommend early ...| red| 1| 86.65|87.95|Red (Extreme)| 89.25|719.41| 18| 93| 72|
|123456789| 2|yellow| 365|yellow| 185| 374| 98| 46| 36|[Recommend early ...| red| 1| 86.65|87.95|Red (Extreme)| 89.25| 840.9| 10| 35| 27|
|987654321| 2|yellow| 193| null| 95| 0| 194| 25| 42|[Recommend early ...| red| 1| 74.44|76.85|Orange (High)| 79.25| 724.2| 10| 50| 38|
|987654321| 2|yellow| 193| null| 95| 0| 194| 25| 42|[Recommend early ...| red| 1| 74.44|76.85|Orange (High)| 79.25| 847.2| 10| 29| 22|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
Upvotes: 1
Views: 269
Reputation: 62383
dict
in the 'icdcodes'
key
to have a separate row, the best option is to use pandas.json_normalize
.pandas.DataFrame.explode('icdcodes')
, which will expand the dataframe to have the appropriate number of rows for each 'clientid'
based on the number if dicts
in 'icdcodes'
..json_normalize()
on the 'icdcodes'
column, which is a list
of dicts
, where some the values
may also be dicts
..join
the two dataframes and drop the 'icdcodes'
columnpandas.DataFrame.rename()
to rename columns, and pandas.DataFrame.drop()
to drop unneeded columns, as necessary.import pandas as pd
# create the initial dataframe from api_results
df = pd.json_normalize(api_results).explode('icdcodes').reset_index(drop=True)
# create a dataframe for only icdcodes, which will expand all the lists of dicts
icdcodes = pd.json_normalize(df.icdcodes)
# join df to icdcodes and drop the icdcodes column
df = df.join(icdcodes).drop(['icdcodes'], axis=1)
# display(df)
requesturl clientid adjustedsummaryguidelines.midrangeallabsence adjustedsummaryguidelines.midrangeclaims adjustedsummaryguidelines.atriskallabsence adjustedsummaryguidelines.atriskclaims riskassessment.score riskassessment.status riskassessment.magnitude riskassessment.volatility adjustedduration.bp.days adjustedduration.cp95.alert adjustedduration.cp95.days adjustedduration.cp100.alert adjustedduration.cp100.days cfactors.legalrep.applied cfactors.legalrep.alert alertdesc.red alertdesc.yellow code name meandurationdays.bp meandurationdays.cp95 meandurationdays.cp100
0 http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4 123456789 46 36 374 98 87.95 Red (Extreme) 86.65 89.25 2 yellow 185 yellow 365 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 719.41 Pain in joint, shoulder region 18 72 93
1 http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4 123456789 46 36 374 98 87.95 Red (Extreme) 86.65 89.25 2 yellow 185 yellow 365 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 840.9 Sprains and strains of unspecified site of shoulder and upper arm 10 27 35
2 NaN 987654321 25 42 0 194 76.85 Orange (High) 74.44 79.25 2 NaN 95 yellow 193 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 724.2 Lumbago 10 38 50
3 NaN 987654321 25 42 0 194 76.85 Orange (High) 74.44 79.25 2 NaN 95 yellow 193 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 847.2 Sprain of lumbar 10 22 29
Upvotes: 1