Reputation: 43
df = pd.DataFrame(pd.read_csv(data_path+'MRI.csv',encoding='utf-8'))
# pd.set_option('display.max_columns',None)
df = df.set_index('Subject ID',drop=False)
# get SubjectID
suid = list(df['Subject ID'].unique())
# creat a new DataFrame
mri = pd.DataFrame()
# use pd.groupby()
grouped = df.groupby('Subject ID')
for id in suid:
group = grouped.get_group(id)
temp = group.duplicated(['Age']).astype(int)
# Insert temp after 'Age' as the sign of 'Age'
group.insert(7, 'temp',temp)
# Replace the value in 'Age' with the value of 'temp'
for index, row in group.iterrows():
if row['temp'] == 1:
group.loc[index, 'Age'] = np.nan
print(group)
break
after the replacement, the all value of 'Age' becomes NA.
Upvotes: 1
Views: 1259
Reputation: 8033
You can do this
df.loc[df.Visit.duplicated(), 'Age']=np.nan
Input
Visit Age
0 ADNI 42
1 ADNI 42
2 ADNI 42
3 ADNI 42
4 BDNI 34
5 BDNI 34
6 BDNI 34
7 BDNI 34
Output
print(df)
Visit Age
0 ADNI 42.0
1 ADNI NaN
2 ADNI NaN
3 ADNI NaN
4 BDNI 34.0
5 BDNI NaN
6 BDNI NaN
7 BDNI NaN
Upvotes: 3