Reputation: 131
I want to select rows from my dataframe df
where any of the many columns contains a value that's in a list my_list
. There are dozens of columns, and there could be more in the future, so I don't want to iterate over each column in a list.
I don't want this:
# for loop / iteration
for col in df.columns:
df.loc[df[col].isin(my_list), "indicator"] = 1
Nor this:
# really long indexing
df = df[(df.col1.isin(my_list) | (df.col2.isin(my_list) | (df.col3.isin(my_list) ... (df.col_N.isin(my_list)] # ad nauseum
Nor do I want to reshape the dataframe from a wide to a long format.
I'm thinking (hoping) there's a way to do this in one line, applying the isin()
to many columns all at once.
Thanks!
I ended up using
df[df.isin(my_list).any(axis=1)]
Upvotes: 2
Views: 752
Reputation: 8826
Alternately you may try:
df[df.apply(lambda x: x.isin(mylist)).any(axis=1)]
OR
df[df[df.columns].isin(mylist)]
Even you don't need o create a list if not utmost necessary rather directly assign it as follows.
df[df[df.columns].isin([3, 12]).any(axis=1)]
After checking your efforts:
>>> df
col_1 col_2 col_3
0 1 1 10
1 2 4 12
2 3 7 18
>>> mylist
[3, 12]
>>> df[df.col_1.isin(mylist) | df.col_2.isin(mylist) | df.col_3.isin(mylist)]
col_1 col_2 col_3
1 2 4 12
2 3 7 18
>>> df[df.isin(mylist).any(axis=1)]
col_1 col_2 col_3
1 2 4 12
2 3 7 18
or :
>>> df[df[df.columns].isin(mylist).any(axis=1)]
col_1 col_2 col_3
1 2 4 12
2 3 7 18
Or :
>>> df[df.apply(lambda x: x.isin(mylist)).any(axis=1)]
col_1 col_2 col_3
1 2 4 12
2 3 7 18
Upvotes: 2
Reputation: 38425
You can use DataFrame.isin() which is a DataFrame method and not a string method.
new_df = df[df.isin(my_list)]
Upvotes: 2