nmog
nmog

Reputation: 236

Filtering out rows with non-alphanumeric characters

I am trying to get a DataFrame from an existing DataFrame containing only the rows where values in a certain column(whose values are strings) do not contain a certain character.

i.e. If the character we don't want is a '('

Original dataframe:

   some_col my_column
0         1      some
1         2      word
2         3    hello(

New dataframe:

   some_col my_column
0         1      some
1         2      word

I have tried df.loc['(' not in df['my_column']], but this does not work since df['my_column'] is a Series object.

I have also tried: df.loc[not df.my_column.str.contains('(')], which also does not work.

Upvotes: 6

Views: 9176

Answers (2)

piRSquared
piRSquared

Reputation: 294358

If you are looking to filter out just that character:

negation of str.contains

Escape the open paren. Some characters can be interpreted as special regex characters. You can escape them with a backslash.

df[~df.my_column.str.contains('\(')]

   some_col my_column
0         1      some
1         2      word

str.match all non-open-paren

By the way, this is a bad idea! Checking the whole string that it isn't a character with regex is gross.

df[df.my_column.str.match('^[^\(]*$')]

   some_col my_column
0         1      some
1         2      word

Comprehension using in

df[['(' not in x for x in df.my_column]]

   some_col my_column
0         1      some
1         2      word

Upvotes: 2

cs95
cs95

Reputation: 402653

You're looking for str.isalpha:

df[df.my_column.str.isalpha()]

   some_col my_column
0         1      some
1         2      word

A similar method is str.isalnum, if you want to retain letters and digits.

If you want to handle letters and whitespace characters, use

df[~df.my_column.str.contains(r'[^\w\s]')]

   some_col my_column
0         1      some
1         2      word

Lastly, if you are looking to remove punctuation as a whole, I've written a Q&A here which might be a useful read: Fast punctuation removal with pandas

Upvotes: 9

Related Questions