Weees
Weees

Reputation: 37

Find exact element position in Dataframe by row and column index

I have a dataframe (650,650) whch look like the one depicted. I need to find elements that has the same column and row name(label/index... call it what you like) and make them equal zero.

       car0     car1    car2     car3
car0     0      0,25    0,83      2

car1    1,23     0      0,83      0

car2    6,83    0,25     0        5

car3    0,23    0,55     0,43     0

car2    12        2      0        0,5

car2    0,5       2      0       0,5

could someone help?

i have been iterating through columns and rows but with no use.

for c,col in df2.iterrows():
    for r,row in df2.iteritems():
        if df2.loc[:,c]== df2[r,:]:
            df2.loc[row,col] = 0
        else: break

Upvotes: 2

Views: 2361

Answers (2)

Valentino
Valentino

Reputation: 7361

From your example seems that columns and rows have the same labels. If so, you can use:

for i in df.columns:
    df[i][i] = 0

This works even if labels are not in the same order (such that zeroes should not appear on the diagonal).

In case there are column labels which do not appear as row labels and viceversa, you need a little more elaborate solution:

for i in df.columns:
    try:
        df.loc[i].loc[i] = 0
    except KeyError:
        pass

In case you have repeated names on rows or columns, please refer to @thesilkworm's answer. You may also use unique() istead of set, but the idea is the same.

for i in df.index.unique().intersection(df.columns.unique()):
    df.loc[i, i] = 0

Upvotes: 2

sjw
sjw

Reputation: 6543

Updated answer after question was updated:

First, get a set of unique values which feature in both the index and column names:

names = set(df.columns).intersection(df.index)

Then iterate over them, using .loc to set values:

for name in names:
    df.loc[name, name] = 0

This is similar to the answer by @Valentino, with an amendment to make sure that we only loop over each name once.


Original answer which assumed that we just had to fill the diagonal:

You could use np.fill_diagonal for this to avoid a loop:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random(size=(5, 5)))
np.fill_diagonal(df.values, 0)

Results:

          0         1         2         3         4
0  0.000000  0.707695  0.275748  0.449722  0.321772
1  0.343112  0.000000  0.051894  0.879492  0.210940
2  0.845859  0.016546  0.000000  0.347568  0.233525
3  0.483467  0.094216  0.583731  0.000000  0.242194
4  0.638833  0.382917  0.321501  0.190206  0.000000

Upvotes: 1

Related Questions