b00kgrrl
b00kgrrl

Reputation: 597

How to check if dataframe row name matches column name

I want to loop through a dataframe, checking to see if the row name matches the column name. If they match, I want to set the value for the intersection to zero. I've tried several options but none of them works. Here is pseudocode that shows what I want to do:

for row in dataframe:
    if row_name == column_name:
        dataframe[rowname][columnname] = 0

This is what the data looks like:

        NAME1    NAME2    NAME3
NAME1    1       .9         .2
NAME2    .6      1          .7
NAME3    .5      .8         1

Upvotes: 2

Views: 2572

Answers (2)

EdChum
EdChum

Reputation: 394209

More convoluted method than @jpp's, you could stack the df so the column names form the second level of the index :

In[296]:
stack = df.stack()
stack

Out[296]: 
NAME1  NAME1    1.0
       NAME2    0.9
       NAME3    0.2
NAME2  NAME1    0.6
       NAME2    1.0
       NAME3    0.7
NAME3  NAME1    0.5
       NAME2    0.8
       NAME3    1.0
dtype: float64

Then we can mask the stacked df and set to 0 where the index level values match:

In[297]:
stack.loc[stack.index.get_level_values(0) == stack.index.get_level_values(1)] = 0
stack

Out[297]: 
NAME1  NAME1    0.0
       NAME2    0.9
       NAME3    0.2
NAME2  NAME1    0.6
       NAME2    0.0
       NAME3    0.7
NAME3  NAME1    0.5
       NAME2    0.8
       NAME3    0.0
dtype: float64

Then we call unstack to revert back to our original df:

In[298]:
stack.unstack()

Out[298]: 
       NAME1  NAME2  NAME3
NAME1    0.0    0.9    0.2
NAME2    0.6    0.0    0.7
NAME3    0.5    0.8    0.0

This has more of a performance hit on a small df as you're creating temporary df's from the calls to stack and unstack but if you have large overlaps of index and column values then it avoids the looping

Upvotes: 0

jpp
jpp

Reputation: 164773

You can calculate the intersection of your index and columns. Then iterate the intersection and use pd.DataFrame.loc to set values.

intersection = df.index & df.columns

for item in intersection:
    df.loc[item, item] = 0

print(df)

       NAME1  NAME2  NAME3
NAME1    0.0    0.9    0.2
NAME2    0.6    0.0    0.7
NAME3    0.5    0.8    0.0

Upvotes: 1

Related Questions