weiboo pan
weiboo pan

Reputation: 43

How to replace duplicate values with null in DataFrame?

In the same Subject ID, how to replace duplicate values in Age with NA according to the category of Visit? And retain the value of the first occurrence.

its my code
how to replace duplicate values in Age with NA according to the category of Visit? And retain the value of the first occurrence.
df = pd.DataFrame(pd.read_csv(data_path+'MRI.csv',encoding='utf-8'))
    # pd.set_option('display.max_columns',None)
    df = df.set_index('Subject ID',drop=False)
    # get SubjectID
    suid = list(df['Subject ID'].unique())
    # creat a new DataFrame
    mri = pd.DataFrame()
    # use pd.groupby()
    grouped = df.groupby('Subject ID')
    for id in suid:
        group = grouped.get_group(id)
        temp = group.duplicated(['Age']).astype(int)
        # Insert temp after 'Age' as the sign of 'Age'
        group.insert(7, 'temp',temp)
        # Replace the value in 'Age' with the value of 'temp'
        for index, row in group.iterrows():
           if row['temp'] == 1:
               group.loc[index, 'Age'] = np.nan
        print(group)
        break
after the replacement, the all value of 'Age' becomes NA.

Upvotes: 1

Views: 1259

Answers (1)

moys
moys

Reputation: 8033

You can do this

df.loc[df.Visit.duplicated(), 'Age']=np.nan

Input

   Visit    Age
0   ADNI    42
1   ADNI    42
2   ADNI    42
3   ADNI    42
4   BDNI    34
5   BDNI    34
6   BDNI    34
7   BDNI    34

Output

print(df)
   Visit    Age
0   ADNI    42.0
1   ADNI    NaN
2   ADNI    NaN
3   ADNI    NaN
4   BDNI    34.0
5   BDNI    NaN
6   BDNI    NaN
7   BDNI    NaN

Upvotes: 3

Related Questions