Reputation: 1754
I am trying to filter a pandas
dataframe using regular expressions
.
I want to delete those rows that do not contain any letters. For example:
Col A.
50000
$927848
dog
cat 583
rabbit 444
My desired results is:
Col A.
dog
cat 583
rabbit 444
I have been trying to solve this problem unsuccessful with regex
and pandas
filter options. See blow. I am specifically running into problems when I try to merge two conditions for the filter. How can I achieve this?
Option 1:
df['Col A.'] = ~df['Col A.'].filter(regex='\d+')
Option 2
df['Col A.'] = df['Col A.'].filter(regex=\w+)
Option 3
from string import digits, letters
df['Col A.'] = (df['Col A.'].filter(regex='|'.join(letters)))
OR
df['Col A.'] = ~(df['Col A.'].filter(regex='|'.join(digits)))
OR
df['Col A.'] = df[~(df['Col A.'].filter(regex='|'.join(digits))) & (df['Col A.'].filter(regex='|'.join(letters)))]
Upvotes: 8
Views: 20925
Reputation: 862921
I think you'd need str.contains
to filter values which contain letters by the means of boolean indexing
:
df = df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
Col A.
2 dog
3 cat 583
4 rabbit 444
If there are some NaN
s values you can pass a parameter:
df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]
print (df)
Col A.
3 dog
4 cat 583
5 rabbit 444
Upvotes: 25
Reputation: 21
df['Col A.'].str.contains(r'^\d+$', na=True)
# if string with only digits or if int/float then will result in NaN converted to True
eg: [50000, '$927848', 'dog', 'cat 583', 'rabbit 444', '3 e 3', 'e 3', '33', '3 e'] will give : [True,False,False,False,False,False,False, True,False]
Upvotes: 2
Reputation: 1287
You can use ^.*[a-zA-Z].*$
https://regex101.com/r/b84ji1/1
Details
^
: Start of the line
.*
: Match any character
[a-zA-Z]
: Match letters
$
: End of the line
Upvotes: 1
Reputation: 842
Have you tried:
df['Col A.'].filter(regex=r'\D') # Keeps only if there's a non-digit character
or:
df['Col A.'].filter(regex=r'[A-Za-z]') # Keeps only if there's a letter (alpha)
or:
df['Col A.'].filter(regex=r'[^\W\d_]') # More info in the link below...
Explanation: https://stackoverflow.com/a/2039476/8933502
Upvotes: 3