Conditional replace of numbers in pandas df

Question

Given the following pandas df:

import pandas as pd


df = pd.DataFrame({'1676' : ['R','NR','R','NR'],
               '1677' : ["NR", "NR" ,"NR", "NR"],
               '1710' : ["R", "R" , "NR", "NR"],
               '1536' : ["NR", "R", "NR", "R"]})

df
    1676    1677    1710    1536
0   R       NR      R       NR
1   NR      NR      R       R
2   R       NR      NR      NR
3   NR      NR      NR      R

and this longer pandas df2:

df2 = pd.DataFrame({'1' : ['1710','1676','2651','1676'],
               '2' : ["2654", "2824" ,"1676", "1677"],
               '3' : ["1676", "3079" , "1677", "2085"],
               '4' : ["1536", "1677", "1409", "1536"],
                  '5' : ["510", "1710" , "1664", "1710"],
                  '6' : ["2590", "3090" , "2252", "2916"],
                  '7' : ["2777", "1536" , "1710", "3140"],
                  '8' : ["1677", "1709" , "1536", "1963"]})

    1       2       3       4       5       6       7       8
0   1710    2654    1676    1536    510     2590    2777    1677
1   1676    2824    3079    1677    1710    3090    1536    1709
2   2651    1676    1677    1409    1664    2252    1710    1536
3   1676    1677    2085    1536    1710    2916    3140    1963

I am wondering if the following is possible row-wise (here for first row):

for each value in df.loc[0,] = "R"
take corresponding column name (number)
search for number in df2.loc[0,]
substitute number in df2.loc[0,] with "R"

So that I get this:

    1       2       3       4       5       6       7       8
0   R       2654    R       1536    510     2590    2777    1677
1   1676    2824    3079    1677    R       3090    R       1709
2   2651    R       1677    1409    1664    2252    1710    1536
3   1676    1677    2085    R       1710    2916    3140    1963

edit:

It's not working for my specific df's. Any guesses what triggers this issue? I already tried resetting the indices.

This is df from the post above:

This is df2 from the post above:

Scott Boston · Accepted Answer

Use np.where and replace:

import numpy as np
r, c = np.where(df=='R')

df2.apply(lambda x: x.replace(df.columns[c[(r == x.name)]], 'R'), axis=1)

Output:

      1     2     3     4     5     6     7     8
0     R  2654     R  1536   510  2590  2777  1677
1  1676  2824  3079  1677     R  3090     R  1709
2  2651     R  1677  1409  1664  2252  1710  1536
3  1676  1677  2085     R  1710  2916  3140  1963

Details:

First, find out the row and columns in df where equal to 'R'
Use apply with axis=1 to go row by row, x.name identify which row index look up the position in C where that equals row index and get the column header from df in that position.
Use replace to replace all values of the column header in df, on that row.

Conditional replace of numbers in pandas df

Answers (2)

Related Questions