triphook
triphook

Reputation: 3097

Mask cells where column name equals index (pandas)

I want to mask out the values in a Pandas DataFrame where the index is the same as the column name. For example:

import pandas as pd
import numpy as np

a = pd.DataFrame(np.arange(12).reshape((4, 3)),
                 index=["a", "b", "c", "d"],
                 columns=["a", "b", "c"])

   a   b   c
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11

After masking:

   a   b   c
a NaN  1   2
b  3  NaN  5
c  6   7  NaN
d  9  10  11

Seems simple enough but I'm not sure how to do it in a Pythonic way, without iteration.

Upvotes: 3

Views: 972

Answers (3)

Scott Boston
Scott Boston

Reputation: 153460

Try using pd.DataFrame.apply to apply a condtion to each dataframe column that matches the index with the series/column name. The result of the apply is a boolean dataFrame and use pd.Dataframe.mask:

a.mask(a.apply(lambda x: x.name == x.index))

Output:

     a     b     c
a  NaN   1.0   2.0
b  3.0   NaN   5.0
c  6.0   7.0   NaN
d  9.0  10.0  11.0

Also, inspired by @QuangHoang you can use np.equal.outer:

a.mask(np.equal.outer(a.index, a.columns)) 

Output:

     a     b     c
a  NaN   1.0   2.0
b  3.0   NaN   5.0
c  6.0   7.0   NaN
d  9.0  10.0  11.0

Upvotes: 5

Quang Hoang
Quang Hoang

Reputation: 150745

You can use broadcasting as well:

a.mask(a.index[:,None] == a.columns[None,:])

Or

a.mask(a.index.values[:,None] == a.columns.values[None,:])

Output:

     a     b     c
a  NaN   1.0   2.0
b  3.0   NaN   5.0
c  6.0   7.0   NaN
d  9.0  10.0  11.0

Upvotes: 2

Erfan
Erfan

Reputation: 42916

Using DataFrame.stack and index.get_level_values:

st = a.stack()
m = st.index.get_level_values(0) == st.index.get_level_values(1) 
a = st.mask(m).unstack()

     a     b     c
a  NaN   1.0   2.0
b  3.0   NaN   5.0
c  6.0   7.0   NaN
d  9.0  10.0  11.0

Upvotes: 2

Related Questions