Reputation: 255
I want to filter out some rows with one of DataFrame's column which data is in a list.
df[df['column'].isin(mylist)]
But I found that it's case sensitive. Is there any method using ".isin()" with case insensitive?
Upvotes: 23
Views: 24945
Reputation: 2993
In Pandas version 1.1.0, you can use the fullmatch function with your list converted to |
conditions. match
has an option to turn off case-sensitivity.
Example: Consider the dataframe df
of
Apple Banana Orange
0 A Boy Cat
1 Ivan Elephant Gold
df.Apple.str.lower().isin(['a', 'c', 'd', 'elephant'])
returns the following result:
0 True
1 False
Name: Apple, dtype: bool
while df.Banana.str.lower().isin(['a', 'c', 'd', 'elephant'])
returns:
0 False
1 True
Name: Banana, dtype: bool
To achieve the same purpose as above, one can use
df.Apple.str.fullmatch('a|c|d|elephant', case=False)
and
df.Banana.str.fullmatch('a|c|d|elephant', case=False)
, which will give the correct results respectively.
0 True
1 False
Name: Apple, dtype: bool
and
0 False
1 True
Name: Banana, dtype: bool
Upvotes: 1
Reputation: 128
I would put my list into a CSV and load it as a dataframe. Afterwards I would run the command:
df_done = df[df["Server Name"].str.lower().isin(df_compare["Computer Name"].str.lower())]
This avoids using for loop and can handle large amounts of data easily.
df = 5000 rows
df_compare = 1000 rows
Upvotes: 0
Reputation: 38415
One way would be by comparing the lower or upper case of the Series with the same for the list
df[df['column'].str.lower().isin([x.lower() for x in mylist])]
The advantage here is that we are not saving any changes to the original df or the list making the operation more efficient
Consider this dummy df:
Color Val
0 Green 1
1 Green 1
2 Red 2
3 Red 2
4 Blue 3
5 Blue 3
For the list l:
l = ['green', 'BLUE']
You can use isin()
df[df['Color'].str.lower().isin([x.lower() for x in l])]
You get
Color Val
0 Green 1
1 Green 1
4 Blue 3
5 Blue 3
Upvotes: 42
Reputation: 13690
I prefer to use the general .apply
myset = set([s.lower() for s in mylist])
df[df['column'].apply(lambda v: v.lower() in myset)]
A lookup in a set
is faster than a lookup in a list
Upvotes: 2
Reputation: 5193
Convert it to a str
using the str
method and get the lowercase version
In [23]: df =pd.DataFrame([['A', 'B', 'C'], ['D', 'E', 6]], columns=['A', 'B', '
...: C'])
In [24]: df
Out[24]:
A B C
0 A B C
1 D E 6
In [25]: df.A
Out[25]:
0 A
1 D
Name: A, dtype: object
In [26]: df.A.str.lower().isin(['a', 'b', 'c'])
Out[26]:
0 True
1 False
Name: A, dtype: bool
Upvotes: 1