Reputation: 21552
Starting from this simple dataframe df
:
col1,col2
1,3
2,1
3,8
I would like to apply a boolean mask
in function of the name of the column. I know that it is easy for values:
mask = df <= 1
df = df[mask]
which returns:
mask:
col1 col2
0 True False
1 False True
2 False False
df:
col1 col2
0 1 NaN
1 NaN 1
2 NaN NaN
as expected. Now I would like to obtain a boolean mask based on the column name, something like:
mask = df == df['col_1']
which should return:
mask
col1 col2
0 True False
1 True False
2 True False
EDIT:
This seems weird, but I need those kind of masks to later filtering by columns seaborn heatmaps.
Upvotes: 5
Views: 24282
Reputation: 31672
You could transpose your dataframe than compare it with the columns and then transpose back. A bit weird but working example:
import pandas as pd
from io import StringIO
data = """
col1,col2
1,3
2,1
3,8
"""
df = pd.read_csv(StringIO(data))
mask = (df.T == df['col1']).T
In [176]: df
Out[176]:
col1 col2
0 1 3
1 2 1
2 3 8
In [178]: mask
Out[178]:
col1 col2
0 True False
1 True False
2 True False
EDIT
I found another answer for that, you could use isin
method:
In [41]: df.isin(df.col1)
Out[41]:
col1 col2
0 True False
1 True False
2 True False
EDIT2
As @DSM show in the comment that these two cases not working correctly. So you should use @KT. method. But.. Let's play more with transpose:
df.col2 = df.col1
In [149]: df
Out[149]:
col1 col2
0 1 1
1 2 2
2 3 3
In [147]: df.isin(df.T[df.columns == 'col1'].T)
Out[147]:
col1 col2
0 True False
1 True False
2 True False
Upvotes: 0
Reputation: 11430
As noted in the comments, situations where you would need to get a "mask" like that seem rare (and chances are, you not in one of them). Consequently, there is probably no nice "built-in" solution for them in Pandas.
None the less, you can achieve what you need, using a hack like the following, for example:
mask = (df == df) & (df.columns == 'col_1')
Update:. As noted in the comments, if your data frame contains nulls, the mask computed this way will always be False
at the corresponding locations. If this is a problem, the safer option is:
mask = ((df == df) | df.isnull()) & (df.columns == 'col_1')
Upvotes: 7