Reputation: 766
The goal here is to count the number of patients for each type of diagnosis. In patient record, visit id is unique, yet in diagnosis record, since one visit could have multiple diagnosis, the same visit id could have multiple diagnosis id.
To do so, i think the 2 data frame need to be linked with the field visit id. Could anyone kindly shed some light on how to link the 2 json via Pandas and count the number of patients for each diagnosis. Many thanks
Patient record
JSON [Patient record]
[
{
"Doctor id":"AU1254",
"Patient":[
{
"Patient id":"BK1221",
"Patient name":"Tim"
}
],
"Visit id":"B0001"
},
{
"Doctor id":"AU8766",
"Patient":[
{
"Patient id":"BK1209",
"Patient name":"Sue"
}
],
"Visit id":"B0002"
},
{
"Doctor id":"AU1254",
"Patient":[
{
"Patient id":"BK1323",
"Patient name":"Sary"
}
],
"Visit id":"B0003"
}
]
Diagnosis record
JSON [Diagnosis record]
[
{
"Visit id":"B0001",
"Diagnosis":[
{
"diagnosis id":"D1001",
"diagnosis name":"fever"
},
{
"diagnosis id":"D1987",
"diagnosis name":"cough"
},
{
"diagnosis id":"D1265",
"diagnosis name":"running nose"
}
]
},
{
"Visit id":"B0002",
"Diagnosis":[
{
"diagnosis id":"D1987",
"diagnosis name":"cough"
},
{
"diagnosis id":"D1453",
"diagnosis name":"stomach ache"
}
]
}
]
Upvotes: 1
Views: 384
Reputation: 1875
You can use left merge()
on visit id
. merge
> from pandas.io.json import json_normalize
> import json
> json1 = <your first json here>
> json2 = <your second json here>
> df1=pd.json_normalize(data=json.loads(json1), record_path='Patient', meta=['Doctor id','Visit id'])
> df2=pd.json_normalize(data=json.loads(json2), record_path='Diagnosis', meta=['Visit id'])
> print(df1.merge(df2, on='Visit id', how='left').dropna())
Patient id Patient name Doctor id Visit id diagnosis id diagnosis name
0 BK1221 Tim AU1254 B0001 D1001 fever
1 BK1221 Tim AU1254 B0001 D1987 cough
2 BK1221 Tim AU1254 B0001 D1265 running nose
3 BK1209 Sue AU8766 B0002 D1987 cough
4 BK1209 Sue AU8766 B0002 D1453 stomach ache
You can also do some fancy grouping / printing
pd.pivot_table(df3, index=['Patient id','Patient name','Doctor id','Visit id'], values=['diagnosis id','diagnosis name'], aggfunc=list)
diagnosis id diagnosis name
Patient id Patient name Doctor id Visit id
BK1209 Sue AU8766 B0002 [D1987, D1453] [cough, stomach ache]
BK1221 Tim AU1254 B0001 [D1001, D1987, D1265] [fever, cough, running nose]
And count per diagnosis / per patient
df3.groupby(['diagnosis id', 'diagnosis name']).agg({'Patient name': [list, 'count']})
Patient name
list count
diagnosis id diagnosis name
D1001 fever [Tim] 1
D1265 running nose [Tim] 1
D1453 stomach ache [Sue] 1
D1987 cough [Tim, Sue] 2
Upvotes: 4
Reputation: 13349
Try: (x
--> JSON [Patient record]
, y
-->JSON [Diagnosis record]
df = pd.DataFrame(x)
df = pd.concat([df.pop('Patient').apply(lambda x: pd.Series(x[0])), df], axis=1)
df1 = pd.DataFrame(y)
df1 = pd.concat([df1.explode('Diagnosis')['Diagnosis'].apply(pd.Series), df1], axis=1)
df1.pop('Diagnosis')
df_merge = pd.merge(df,df1, on='Visit id', how='right')
df_merge:
Patient id Patient name Doctor id Visit id diagnosis id diagnosis name
0 BK1221 Tim AU1254 B0001 D1001 fever
1 BK1221 Tim AU1254 B0001 D1987 cough
2 BK1221 Tim AU1254 B0001 D1265 running nose
3 BK1209 Sue AU8766 B0002 D1987 cough
4 BK1209 Sue AU8766 B0002 D1453 stomach ache
To count:
df_merge.groupby('diagnosis name')['Patient id'].count()
Edit:
Try:
df_merge.groupby('diagnosis name').agg({'Patient name': [list, 'count']}).reset_index()
diagnosis name Patient name
list count
cough [Tim, Sue] 2
fever [Tim] 1
running nose [Tim] 1
stomach ache [Sue] 1
Upvotes: 1
Reputation: 89
Try the following for the patient records.
patients_df = pd.read_json(patients.json)
patient_id = []
patient_name =[]
# Get attributes from nested nested datatypes in Patient column
for patient in patients_df['Patients']:
patient_id = patient[0]['Patient id']
patient_name = patient[0]['Patient name']
# Add to the pandas dataframe
patients_df['Patient name'] = patient_name
patient_df['Patient id'] = patient_id
# Drop the 'Patient' column
patients_df = patients_df.drop(column='Patient')
Upvotes: 0