Filtering out rows with non-alphanumeric characters

Question

I am trying to get a DataFrame from an existing DataFrame containing only the rows where values in a certain column(whose values are strings) do not contain a certain character.

i.e. If the character we don't want is a '('

Original dataframe:

   some_col my_column
0         1      some
1         2      word
2         3    hello(

New dataframe:

   some_col my_column
0         1      some
1         2      word

I have tried df.loc['(' not in df['my_column']], but this does not work since df['my_column'] is a Series object.

I have also tried: df.loc[not df.my_column.str.contains('(')], which also does not work.

cs95 · Accepted Answer

You're looking for str.isalpha:

df[df.my_column.str.isalpha()]

   some_col my_column
0         1      some
1         2      word

A similar method is str.isalnum, if you want to retain letters and digits.

If you want to handle letters and whitespace characters, use

df[~df.my_column.str.contains(r'[^\w\s]')]

   some_col my_column
0         1      some
1         2      word

Lastly, if you are looking to remove punctuation as a whole, I've written a Q&A here which might be a useful read: Fast punctuation removal with pandas

Filtering out rows with non-alphanumeric characters

Answers (2)

negation of `str.contains`

`str.match` all non-open-paren

Comprehension using `in`

Related Questions

Filtering out rows with non-alphanumeric characters

Answers (2)

negation of str.contains

str.match all non-open-paren

Comprehension using in

Related Questions

negation of `str.contains`

`str.match` all non-open-paren

Comprehension using `in`