Baptiste
Baptiste

Reputation: 93

Pandas: What is the difference between isin() and str.contains()?

I want to know if a specific string is present in some columns of my dataframe (a different string for each column). From what I understand isin() is written for dataframes but can work for Series as well, while str.contains() works better for Series.

I don't understand how I should choose between the two. (I searched for similar questions but didn't find any explanation on how to choose between the two.)

Upvotes: 9

Views: 36571

Answers (1)

DeepSpace
DeepSpace

Reputation: 81604

.isin checks if each value in the column is contained in a list of arbitrary values. Roughly equivalent to value in [value1, value2].

str.contains checks if arbitrary values are contained in each value in the column. Roughly equivalent to substring in large_string.

In other words, .isin works column-wise and is available for all data types. str.contains works element-wise and makes sense only when dealing with strings (or values that can be represented as strings).

From the official documentation:

Series.isin(values)

Check whether values are contained in Series. Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.


Series.str.contains(pat, case=True, flags=0, na=nan,** **regex=True)

Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Examples:

print(df)
#     a
# 0  aa
# 1  ba
# 2  ca

print(df[df['a'].isin(['aa', 'ca'])])
#     a
# 0  aa
# 2  ca

print(df[df['a'].str.contains('b')])
#     a
# 1  ba

Upvotes: 24

Related Questions