Reputation: 131
Suppose I have a pandas dataframe like this:
Word Ratings
0 TLYSFFPK 1
1 SVLENFVGR 2
2 SVFNHAIRK 3
3 KAGEVFIHK 4
How can I use regex in pandas to filter out the rows that have the word that match the following regex pattern but keep the dataframe formatting? The regex pattern is: \b.[VIFY][MLFYIA]\w+[LIYVF].[KR]\b
Expected output:
Word Ratings
1 SVLENFVGR 2
2 SVFNHAIRK 3
Upvotes: 3
Views: 11549
Reputation: 210832
Demo:
In [2]: df
Out[2]:
Word Ratings
0 TLYSFFPK 1
1 SVLENFVGR 2
2 SVFNHAIRH 3
3 KAGEVFIHK 4
In [3]: pat = r'\b.[VIFY][MLFYIA]\w+[LIYVF].[KR]\b'
In [4]: df.Word.str.contains(pat)
Out[4]:
0 False
1 True
2 False
3 False
Name: Word, dtype: bool
In [5]: df[df.Word.str.contains(pat)]
Out[5]:
Word Ratings
1 SVLENFVGR 2
Upvotes: 12