How to flatten a json from an api using pandas

Question

I have a json coming back from an API that I am appending to a list. After I finish making that call I need to flatten that data using pandas. I'm not sure how to do.

Code:

api_results = []

response = requests.post(target_url, data=doc, headers=login_details)
       response_data = json.loads(response.text)
       if type(response_data)==dict and 'error' in response_data.keys():
           error_results.append(response_data)
       else:
           api_results.append(response_data)

When I call api_results I my data looks like so:

[{"requesturl":"http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4","clientid":"123456789","adjustedsummaryguidelines":{"midrangeallabsence":46,"midrangeclaims":36,"atriskallabsence":374,"atriskclaims":98},"riskassessment":{"score":87.95,"status":"Red (Extreme)","magnitude":"86.65","volatility":"89.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"alert":"yellow","days":185},"cp100":{"alert":"yellow","days":365}},"icdcodes":[{"code":"719.41","name":"Pain in joint, shoulder region","meandurationdays":{"bp":18,"cp95":72,"cp100":93}},{"code":"840.9","name":"Sprains and strains of unspecified site of shoulder and upper arm","meandurationdays":{"bp":10,"cp95":27,"cp100":35}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}
,{"clientid":"987654321","adjustedsummaryguidelines":{"midrangeallabsence":25,"midrangeclaims":42,"atriskallabsence":0,"atriskclaims":194},"riskassessment":{"score":76.85,"status":"Orange (High)","magnitude":"74.44","volatility":"79.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"days":95},"cp100":{"alert":"yellow","days":193}},"icdcodes":[{"code":"724.2","name":"Lumbago","meandurationdays":{"bp":10,"cp95":38,"cp100":50}},{"code":"847.2","name":"Sprain of lumbar","meandurationdays":{"bp":10,"cp95":22,"cp100":29}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}]

I've been using json_normalize but I know I'm not using this library correctly.

How do I flatten this data?

What I need is this:

+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
| clientid|days| alert|days| alert|days|atriskallabsence|atriskclaims|midrangeallabsence|midrangeclaims|           alertdesc|alert|applied|magnitude|score|       status|volatility|  code| bp|cp100|cp95|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
|123456789|   2|yellow| 365|yellow| 185|             374|          98|                46|            36|[Recommend early ...|  red|      1|    86.65|87.95|Red (Extreme)|     89.25|719.41| 18|   93|  72|
|123456789|   2|yellow| 365|yellow| 185|             374|          98|                46|            36|[Recommend early ...|  red|      1|    86.65|87.95|Red (Extreme)|     89.25| 840.9| 10|   35|  27|
|987654321|   2|yellow| 193|  null|  95|               0|         194|                25|            42|[Recommend early ...|  red|      1|    74.44|76.85|Orange (High)|     79.25| 724.2| 10|   50|  38|
|987654321|   2|yellow| 193|  null|  95|               0|         194|                25|            42|[Recommend early ...|  red|      1|    74.44|76.85|Orange (High)|     79.25| 847.2| 10|   29|  22|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+

Trenton McKinney · Accepted Answer

Because the desired result is for the data from each dict in the 'icdcodes' key to have a separate row, the best option is to use pandas.json_normalize.
First create the main dataframe and use pandas.DataFrame.explode('icdcodes'), which will expand the dataframe to have the appropriate number of rows for each 'clientid' based on the number if dicts in 'icdcodes'.
Use .json_normalize() on the 'icdcodes' column, which is a list of dicts, where some the values may also be dicts.
.join the two dataframes and drop the 'icdcodes' column
Use pandas.DataFrame.rename() to rename columns, and pandas.DataFrame.drop() to drop unneeded columns, as necessary.
Also see this answer from SO: Splitting dictionary/list inside a Pandas Column into Separate Columns

import pandas as pd

# create the initial dataframe from api_results
df = pd.json_normalize(api_results).explode('icdcodes').reset_index(drop=True)

# create a dataframe for only icdcodes, which will expand all the lists of dicts
icdcodes = pd.json_normalize(df.icdcodes)

# join df to icdcodes and drop the icdcodes column
df = df.join(icdcodes).drop(['icdcodes'], axis=1)

# display(df)
                                                                                             requesturl   clientid  adjustedsummaryguidelines.midrangeallabsence  adjustedsummaryguidelines.midrangeclaims  adjustedsummaryguidelines.atriskallabsence  adjustedsummaryguidelines.atriskclaims  riskassessment.score riskassessment.status riskassessment.magnitude riskassessment.volatility  adjustedduration.bp.days adjustedduration.cp95.alert  adjustedduration.cp95.days adjustedduration.cp100.alert  adjustedduration.cp100.days cfactors.legalrep.applied cfactors.legalrep.alert                                                       alertdesc.red                                                   alertdesc.yellow    code                                                               name  meandurationdays.bp  meandurationdays.cp95  meandurationdays.cp100
0  http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4  123456789                                            46                                        36                                         374                                      98                 87.95         Red (Extreme)                    86.65                     89.25                         2                      yellow                         185                       yellow                          365                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.  719.41                                     Pain in joint, shoulder region                   18                     72                      93
1  http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4  123456789                                            46                                        36                                         374                                      98                 87.95         Red (Extreme)                    86.65                     89.25                         2                      yellow                         185                       yellow                          365                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   840.9  Sprains and strains of unspecified site of shoulder and upper arm                   10                     27                      35
2                                                                                                   NaN  987654321                                            25                                        42                                           0                                     194                 76.85         Orange (High)                    74.44                     79.25                         2                         NaN                          95                       yellow                          193                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   724.2                                                            Lumbago                   10                     38                      50
3                                                                                                   NaN  987654321                                            25                                        42                                           0                                     194                 76.85         Orange (High)                    74.44                     79.25                         2                         NaN                          95                       yellow                          193                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   847.2                                                   Sprain of lumbar                   10                     22                      29

How to flatten a json from an api using pandas

Answers (1)

Related Questions