Reputation: 93
I want to know if a specific string is present in some columns of my dataframe (a different string for each column).
From what I understand isin()
is written for dataframes but can work for Series as well, while str.contains()
works better for Series.
I don't understand how I should choose between the two. (I searched for similar questions but didn't find any explanation on how to choose between the two.)
Upvotes: 9
Views: 36571
Reputation: 81604
.isin
checks if each value in the column is contained in a list of arbitrary values. Roughly equivalent to value in [value1, value2]
.
str.contains
checks if arbitrary values are contained in each value in the column. Roughly equivalent to substring in large_string
.
In other words, .isin
works column-wise and is available for all data types. str.contains
works element-wise and makes sense only when dealing with strings (or values that can be represented as strings).
From the official documentation:
Check whether values are contained in Series. Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.
Series.str.contains(pat, case=True, flags=0, na=nan,** **regex=True)
Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
Examples:
print(df)
# a
# 0 aa
# 1 ba
# 2 ca
print(df[df['a'].isin(['aa', 'ca'])])
# a
# 0 aa
# 2 ca
print(df[df['a'].str.contains('b')])
# a
# 1 ba
Upvotes: 24