Sean Payne
Sean Payne

Reputation: 1961

Pandas Filtering by Substring

I have a dataframe with the structure

name          code
Big J          323
Big J          323
Big J          323
Big J Online   323
Big J Online   323
Big J Online   323
Code Base      476
Code Base      476

Sometimes there are duplicates that contain online. In this situation, I want to keep all of the online versions and remove the offline versions. If there is no online version, I want to keep the offline version.

The result would be

Big J Online   323
Big J Online   323
Big J Online   323
Code Base      476
Code Base      476

Is there an efficient way to filter this out?

Upvotes: 1

Views: 43

Answers (2)

jezrael
jezrael

Reputation: 862511

you can filter rows with Online, remove it by replace and filter out this rows in Series.isin (solution not working per groups):

vals = df.loc[df['name'].str.contains(' Online'), 'name'].str.replace(' Online','')

df = df[~df['name'].isin(vals)]
print (df)
           name  code
3  Big J Online   323
4  Big J Online   323
5  Big J Online   323
6     Code Base   476
7     Code Base   476

Same solution per groups:

m = df['name'].str.contains('Online')
f = lambda x: x['name'].isin(x['new'])
df = df[~df.assign(new = df.loc[m, 'name'].str.replace(' Online',''))
           .groupby('code', group_keys=False)
           .apply(f)]
print (df)
           name  code
3  Big J Online   323
4  Big J Online   323
5  Big J Online   323
6     Code Base   476
7     Code Base   476

Upvotes: 2

sammywemmy
sammywemmy

Reputation: 28644

One possible way to solve it is to create a temporary column that checks if 'online' exists in any of the values in name per code, then filter out those rows:

(df.assign(online = df['name'].str.contains('Online'), 
           checker = lambda df: df.groupby('code').online.transform('sum'))
   .loc[lambda df: ~((df.online.eq(0)) & (df.checker.gt(0))), df.columns]
)

          name  code
3  Big J Online   323
4  Big J Online   323
5  Big J Online   323
6     Code Base   476
7     Code Base   476

Upvotes: 0

Related Questions