How to replace duplicate values with null in DataFrame?

Question

its my code
how to replace duplicate values in Age with NA according to the category of Visit? And retain the value of the first occurrence.

df = pd.DataFrame(pd.read_csv(data_path+'MRI.csv',encoding='utf-8'))
    # pd.set_option('display.max_columns',None)
    df = df.set_index('Subject ID',drop=False)
    # get SubjectID
    suid = list(df['Subject ID'].unique())
    # creat a new DataFrame
    mri = pd.DataFrame()
    # use pd.groupby()
    grouped = df.groupby('Subject ID')
    for id in suid:
        group = grouped.get_group(id)
        temp = group.duplicated(['Age']).astype(int)
        # Insert temp after 'Age' as the sign of 'Age'
        group.insert(7, 'temp',temp)
        # Replace the value in 'Age' with the value of 'temp'
        for index, row in group.iterrows():
           if row['temp'] == 1:
               group.loc[index, 'Age'] = np.nan
        print(group)
        break

after the replacement, the all value of 'Age' becomes NA.

moys · Accepted Answer

You can do this

df.loc[df.Visit.duplicated(), 'Age']=np.nan

Input

   Visit    Age
0   ADNI    42
1   ADNI    42
2   ADNI    42
3   ADNI    42
4   BDNI    34
5   BDNI    34
6   BDNI    34
7   BDNI    34

Output

print(df)
   Visit    Age
0   ADNI    42.0
1   ADNI    NaN
2   ADNI    NaN
3   ADNI    NaN
4   BDNI    34.0
5   BDNI    NaN
6   BDNI    NaN
7   BDNI    NaN

How to replace duplicate values with null in DataFrame?

Answers (1)

Related Questions