Reputation: 241
I am using str.contains on a large dataframe and I need a way such that str.contains returns the records where my str.contains function is True. (the dataframe is several thousand lines long and I am looking for 8 true responses).
Thanks!
aa = filtered_to_df.body.str.contains('AA')
aa.head(10)
Out[312]:
15864 False
18040 False
22576 False
28092 False
32800 False
33236 False
38027 False
41222 False
46647 False
87645 False
Name: body, dtype: bool
Upvotes: 3
Views: 5715
Reputation: 1352
important distinction: str.contains
does not actually filter your dataframe or series, it just returns a boolean vector of the same dimension as the series you applied it on.
e.g: if you have a series like this:
my_series = pd.Series(['hello world', 'hello', 'world'])
print(my_series)
0 hello world
1 hello
2 world
dtype: object
using str.contains("hello")
on this will return a series of size 3 since it will give you True / False for every cell in the series -- does that cell contain the word "hello"?
my_series = pd.Series(['hello world', 'hello', 'world'])
print(my_series.str.contains("hello"))
0 True
1 True
2 False
dtype: bool
to actually filter the dataframe or series, you need to wrap around it with a slicing operation.
my_series[my_series.str.contains("hello")]
0 hello world
1 hello
dtype: object
Upvotes: 3