Giannis
Giannis

Reputation: 83

Keeping the closest row based on another value in the same column

I have the following dataframe:

df = pd.DataFrame({'Subject': [1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,5], 'dx1': ['CN','AD','AD','CN','UNC','AD','AD','CN','CN','CN','AD','CN','CN','CN','AD','AD']})

enter image description here

For each Subject I want to keep the previous or the following row where dx1 == 'CN', when dx1 == 'AD. Note that for subjects 3,4 and 5 there are more than one cases of 'CN'. In these cases I want to keep the closest row to the first appearance of dx1 == 'AD'.

In the following dataframe are marked the rows that I want to keep. enter image description here

Upvotes: 1

Views: 61

Answers (1)

jezrael
jezrael

Reputation: 863226

Idea is remove not rows with AD, CN by filtering in first step and then filter pattern AD, CN and CN, AD per groups:

#create deafult index - used for matching final ouput
df = df.reset_index(drop=True)

df1 = df[df['dx1'].isin(['AD','CN'])]

s1 = df1.groupby('Subject')['dx1'].shift(-1)
s2 = df1.groupby('Subject')['dx1'].shift()

cn = df1['dx1'].eq('CN')
ad = df1['dx1'].eq('AD')

m1 = cn & s1.eq('AD')
m2 = ad & s2.eq('CN')

m11 = ad & s1.eq('CN')
m22 = cn & s2.eq('AD')

df2 = df1[m1 | m2 | m11 | m22].copy()

print (df2)
    Subject dx1
0         1  CN
1         1  AD
3         2  CN
5         2  AD
6         3  AD
7         3  CN
9         4  CN
10        4  AD
11        4  CN
13        5  CN
14        5  AD

Then filter only CN values and drop duplicates by both columns for final values:

mask = df2['dx1'].eq('CN')

idx = df2[mask].drop_duplicates(subset=['Subject','dx1']).index

df3 = df.loc[idx]
print (df3)
    Subject dx1
0         1  CN
3         2  CN
7         3  CN
9         4  CN
13        5  CN

Or for new column:

df['new'] = df.index.isin(idx)
print (df)
    Subject  dx1    new
0         1   CN   True
1         1   AD  False
2         1   AD  False
3         2   CN   True
4         2  UNC  False
5         2   AD  False
6         3   AD  False
7         3   CN   True
8         3   CN  False
9         4   CN   True
10        4   AD  False
11        4   CN  False
12        5   CN  False
13        5   CN   True
14        5   AD  False
15        5   AD  False

Upvotes: 1

Related Questions