Reputation: 255

python pandas.Series.isin with case insensitive

I want to filter out some rows with one of DataFrame's column which data is in a list.

df[df['column'].isin(mylist)]

But I found that it's case sensitive. Is there any method using ".isin()" with case insensitive?

Upvotes: 23

Answers (5)

Ka Wa Yip

Reputation: 3009

In Pandas version 1.1.0, you can use the fullmatch function with your list converted to | conditions. match has an option to turn off case-sensitivity.

Example: Consider the dataframe df of

    Apple   Banana      Orange
0   A       Boy         Cat
1   Ivan    Elephant    Gold

df.Apple.str.lower().isin(['a', 'c', 'd', 'elephant']) returns the following result:

0     True
1    False
Name: Apple, dtype: bool

while df.Banana.str.lower().isin(['a', 'c', 'd', 'elephant']) returns:

0    False
1     True
Name: Banana, dtype: bool

Pandas fullmatch function, with case=False

To achieve the same purpose as above, one can use
df.Apple.str.fullmatch('a|c|d|elephant', case=False)
and
df.Banana.str.fullmatch('a|c|d|elephant', case=False)
, which will give the correct results respectively.

0     True
1    False
Name: Apple, dtype: bool

and

0    False
1     True
Name: Banana, dtype: bool

Upvotes: 1

Dayantat

Reputation: 128

I would put my list into a CSV and load it as a dataframe. Afterwards I would run the command:

df_done = df[df["Server Name"].str.lower().isin(df_compare["Computer Name"].str.lower())]

This avoids using for loop and can handle large amounts of data easily.

df = 5000 rows
df_compare = 1000 rows

Upvotes: 0

Vaishali

Reputation: 38425

One way would be by comparing the lower or upper case of the Series with the same for the list

df[df['column'].str.lower().isin([x.lower() for x in mylist])]

The advantage here is that we are not saving any changes to the original df or the list making the operation more efficient

Consider this dummy df:

    Color   Val
0   Green   1
1   Green   1
2   Red     2
3   Red     2
4   Blue    3
5   Blue    3

For the list l:

l = ['green', 'BLUE']

You can use isin()

df[df['Color'].str.lower().isin([x.lower() for x in l])]

You get

    Color   Val
0   Green   1
1   Green   1
4   Blue    3
5   Blue    3

Upvotes: 42

Uri Goren

Reputation: 13700

I prefer to use the general .apply

myset = set([s.lower() for s in mylist])
df[df['column'].apply(lambda v: v.lower() in myset)]

A lookup in a set is faster than a lookup in a list

Upvotes: 2

Cory Madden

Reputation: 5203

Convert it to a str using the str method and get the lowercase version

In [23]: df =pd.DataFrame([['A', 'B', 'C'], ['D', 'E', 6]], columns=['A', 'B', '
    ...: C'])

In [24]: df
Out[24]: 
   A  B  C
0  A  B  C
1  D  E  6

In [25]: df.A
Out[25]: 
0    A
1    D
Name: A, dtype: object

In [26]: df.A.str.lower().isin(['a', 'b', 'c'])
Out[26]: 
0     True
1    False
Name: A, dtype: bool

Upvotes: 1

python pandas.Series.isin with case insensitive

Answers (5)

Pandas fullmatch function, with case=False

Related Questions