Reputation: 2033
I have a dataframe that contains mother ids and multiple observations for the column (preDiabetes) as such:
ChildID MotherID preDiabetes
0 20 455 No
1 20 455 Not documented
2 13 102 NaN
3 13 102 Yes
4 702 946 No
5 82 571 No
6 82 571 Yes
7 82 571 Not documented
I want to transform the multiple observational feature (preDiabetes) into one with single observations for each MotherID.
To do this, I will create a new dataframe with feature newPreDiabetes and:
Therefore, my new dataframe will have single observation for the feature preDiabetes and unique MotherIDs as such:
ChildID MotherID newPreDiabetes
0 20 455 No
1 13 102 Yes
2 702 946 No
3 82 571 Yes
I am new to Python and Pandas, so I am not sure what the best way to achieve this is, but this is what I have tried so far:
# get list of all unique mother ids
uniqueMotherIds = pd.unique(df[['MotherID']].values.ravel())
# create new dataframe that will contain unique MotherIDs and single observations for newPreDiabetes
newDf = {'MotherID','newPreDiabetes' }
# iterate through list of all mother ids and look for preDiabetes=="Yes"
for id in uniqueMotherIds:
filteredDf= df[df['MotherID'] == id].preDiabetes=="Yes"
result = pd.concat([filteredDf, newDf])
The code is not yet complete and I would appreciate some help as I am not sure if I am on the right track!
Many thanks :)
Upvotes: 0
Views: 76
Reputation: 16876
df = pd.DataFrame({
'MotherID': [455, 455,102,102,946,571,571,571],
'preDiabetes' : ['No','Not documented', np.NaN,
'Yes', 'No','No','Yes','Not documented'],
'ChildID' : [20,20,13,13,702,82,82,82]
})
result = df.groupby(['MotherID', 'ChildID'])['preDiabetes'].apply(list).reset_index()
result['newPreDiabetes'] = result['preDiabetes'].apply(
lambda x: 'Yes' if 'Yes' in x else 'No')
result = result.drop(columns=['preDiabetes'])
Output:
MotherID ChildID newPreDiabetes
0 102 13 Yes
1 455 20 No
2 571 82 Yes
3 946 702 No
Upvotes: 1