sums22
sums22

Reputation: 2033

Transforming multiple observational feature to single observational feature in Pandas Dataframe Python

I have a dataframe that contains mother ids and multiple observations for the column (preDiabetes) as such:

    ChildID   MotherID   preDiabetes
0     20      455        No
1     20      455        Not documented
2     13      102        NaN
3     13      102        Yes
4     702     946        No
5     82      571        No
6     82      571        Yes
7     82      571        Not documented

I want to transform the multiple observational feature (preDiabetes) into one with single observations for each MotherID.

To do this, I will create a new dataframe with feature newPreDiabetes and:

Therefore, my new dataframe will have single observation for the feature preDiabetes and unique MotherIDs as such:

    ChildID   MotherID   newPreDiabetes
0   20        455        No
1   13        102        Yes
2   702       946        No
3   82        571        Yes

I am new to Python and Pandas, so I am not sure what the best way to achieve this is, but this is what I have tried so far:


    # get list of all unique mother ids
    uniqueMotherIds = pd.unique(df[['MotherID']].values.ravel())
    
    # create new dataframe that will contain unique MotherIDs and single observations for newPreDiabetes
    newDf = {'MotherID','newPreDiabetes' }
    
    # iterate through list of all mother ids and look for preDiabetes=="Yes"
    for id in uniqueMotherIds:
        filteredDf= df[df['MotherID'] == id].preDiabetes=="Yes"
        result = pd.concat([filteredDf, newDf])

The code is not yet complete and I would appreciate some help as I am not sure if I am on the right track!

Many thanks :)

Upvotes: 0

Views: 76

Answers (1)

mujjiga
mujjiga

Reputation: 16876

df = pd.DataFrame({
        'MotherID': [455, 455,102,102,946,571,571,571],
        'preDiabetes' : ['No','Not documented', np.NaN,
                         'Yes', 'No','No','Yes','Not documented'],
        'ChildID' : [20,20,13,13,702,82,82,82]                   
                   })

result = df.groupby(['MotherID', 'ChildID'])['preDiabetes'].apply(list).reset_index()
result['newPreDiabetes'] = result['preDiabetes'].apply(
    lambda x: 'Yes' if 'Yes' in x else 'No')
result = result.drop(columns=['preDiabetes'])

Output:


   MotherID ChildID newPreDiabetes
0   102     13      Yes
1   455     20      No
2   571     82      Yes
3   946     702     No

Upvotes: 1

Related Questions