Reputation: 340
I have following dataframe
id pattern1 pattern2 pattern3
1 a-b-c a-b-- a-b-c
2 a-a-- a-b-- a-c--
3 a-v-- a-m-- a-k--
4 a-b-- a-n-- a-n-c
I want to filter rows that contains the pattern -- at the end in all the columns. In this case the output would be
2 a-a-- a-b-- a-c--
3 a-v-- a-m-- a-k--
So far I can only think of doing something like the following
df[(len(df['pattern1'].str.split('--')[1])==0) & \
(len(df['pattern2'].str.split('--')[1])==0) & \
(len(df['pattern3'].str.split('--')[1])==0)]
This doesn't work.Also,I can't write the names of all the columns as tehre are 20 columns. How can I filter rows where all the columns in that row match certain pattern/condition?
Upvotes: 1
Views: 253
Reputation: 403198
Start with setting "id" as the index, if not yet done.
df = df.set_index('id')
One option to check each string is using applymap
calling str.endswith
:
df[df.applymap(lambda x: x.endswith('--')).all(1)]
pattern1 pattern2 pattern3
id
2 a-a-- a-b-- a-c--
3 a-v-- a-m-- a-k--
Another option is apply
calling pd.Series.str.endswith
for each column:
df[df.apply(lambda x: x.str.endswith('--')).all(1)]
pattern1 pattern2 pattern3
id
2 a-a-- a-b-- a-c--
3 a-v-- a-m-- a-k--
Lastly, for performance, you can AND masks inside a list comprehension using logical_and.reduce
:
# m = np.logical_and.reduce([df[c].str.endswith('--') for c in df.columns])
m = np.logical_and.reduce([
[x.endswith('--') for x in df[c]] for c in df.columns])
m
# array([False, True, True, False])
df[m]
pattern1 pattern2 pattern3
id
2 a-a-- a-b-- a-c--
3 a-v-- a-m-- a-k--
If there are other columns, but you only want to consider those named "pattern*", you can use filter
on the DataFrame:
u = df.filter(like='pattern')
Now repeat the options above using u
, for example, the first option will be
df[u.applymap(lambda x: x.endswith('--')).all(1)]
...and so on.
Upvotes: 4