Reputation: 3097
I want to mask out the values in a Pandas DataFrame where the index is the same as the column name. For example:
import pandas as pd
import numpy as np
a = pd.DataFrame(np.arange(12).reshape((4, 3)),
index=["a", "b", "c", "d"],
columns=["a", "b", "c"])
a b c
a 0 1 2
b 3 4 5
c 6 7 8
d 9 10 11
After masking:
a b c
a NaN 1 2
b 3 NaN 5
c 6 7 NaN
d 9 10 11
Seems simple enough but I'm not sure how to do it in a Pythonic way, without iteration.
Upvotes: 3
Views: 972
Reputation: 153460
Try using pd.DataFrame.apply to apply a condtion to each dataframe column that matches the index with the series/column name. The result of the apply is a boolean dataFrame and use pd.Dataframe.mask:
a.mask(a.apply(lambda x: x.name == x.index))
Output:
a b c
a NaN 1.0 2.0
b 3.0 NaN 5.0
c 6.0 7.0 NaN
d 9.0 10.0 11.0
Also, inspired by @QuangHoang you can use np.equal.outer:
a.mask(np.equal.outer(a.index, a.columns))
Output:
a b c
a NaN 1.0 2.0
b 3.0 NaN 5.0
c 6.0 7.0 NaN
d 9.0 10.0 11.0
Upvotes: 5
Reputation: 150745
You can use broadcasting as well:
a.mask(a.index[:,None] == a.columns[None,:])
Or
a.mask(a.index.values[:,None] == a.columns.values[None,:])
Output:
a b c
a NaN 1.0 2.0
b 3.0 NaN 5.0
c 6.0 7.0 NaN
d 9.0 10.0 11.0
Upvotes: 2
Reputation: 42916
Using DataFrame.stack
and index.get_level_values
:
st = a.stack()
m = st.index.get_level_values(0) == st.index.get_level_values(1)
a = st.mask(m).unstack()
a b c
a NaN 1.0 2.0
b 3.0 NaN 5.0
c 6.0 7.0 NaN
d 9.0 10.0 11.0
Upvotes: 2