Reputation: 236
I am trying to get a DataFrame from an existing DataFrame containing only the rows where values in a certain column(whose values are strings) do not contain a certain character.
i.e. If the character we don't want is a '('
Original dataframe:
some_col my_column
0 1 some
1 2 word
2 3 hello(
New dataframe:
some_col my_column
0 1 some
1 2 word
I have tried df.loc['(' not in df['my_column']]
, but this does not work since df['my_column']
is a Series object.
I have also tried: df.loc[not df.my_column.str.contains('(')]
, which also does not work.
Upvotes: 6
Views: 9176
Reputation: 294358
If you are looking to filter out just that character:
str.contains
Escape the open paren. Some characters can be interpreted as special regex characters. You can escape them with a backslash.
df[~df.my_column.str.contains('\(')]
some_col my_column
0 1 some
1 2 word
str.match
all non-open-parenBy the way, this is a bad idea! Checking the whole string that it isn't a character with regex is gross.
df[df.my_column.str.match('^[^\(]*$')]
some_col my_column
0 1 some
1 2 word
in
df[['(' not in x for x in df.my_column]]
some_col my_column
0 1 some
1 2 word
Upvotes: 2
Reputation: 402653
You're looking for str.isalpha
:
df[df.my_column.str.isalpha()]
some_col my_column
0 1 some
1 2 word
A similar method is str.isalnum
, if you want to retain letters and digits.
If you want to handle letters and whitespace characters, use
df[~df.my_column.str.contains(r'[^\w\s]')]
some_col my_column
0 1 some
1 2 word
Lastly, if you are looking to remove punctuation as a whole, I've written a Q&A here which might be a useful read: Fast punctuation removal with pandas
Upvotes: 9