how to change the values of specific rows for specifc columns, with the values of specific rows in the same dataframe in pandas

Question

I have the following dataframe

import pandas as pd


       dt = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
                       'country':['a','a','a','a','b','b'],
                       'variable_1':[0,0,0,0,4,8],
                       'variable_2': [2, 3, 4, 5, 14, 18],
                       'variable3': [1, 1, 1, 1, 2, 2]})

I would like to change the values of the variable_1 and variable_2 for the id_resps in country a that are also in country b, with the values of the variable_1 and variable_2 respectively of the id_resps that are in country b

My resulting dataframe should like

  dt_f = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
                           'country':['a','a','a','a','b','b'],
                           'variable_1':[4,8,0,0,4,8],
                           'variable_2': [14, 18, 4, 5, 14, 18],
                           'variable3': [1, 1, 1, 1, 2, 2]})

Is there a pythonic way of doing that (also selection the variable_1 and variable_2 using a regex) ?

jezrael · Accepted Answer

Use DataFrame.merge with left join ony be filtered rows, then replace missing values by original by DataFrame.fillna, get columns by original data and last convertm types by original by DataFrame.astype with DataFrame.dtypes:

df1 = dt[dt['country'].eq('b')]
df = (dt.merge(df1, on='id_resp', how='left', suffixes=('_',''))
        .fillna(dt)[dt.columns]
        .astype(dt.dtypes))
print (df)
   id_resp country  variable_1  variable_2
0        1       b           4          14
1        2       b           8          18
2        3       a           0           4
3        4       a           0           5
4        1       b           4          14
5        2       b           8          18

EDIT: If want filter only some columns with regex one possible solution with DataFrame.filter, also is necessary add id_resp with selected columns:

dt = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
                       'country':['a','a','a','a','b','b'],
                       'variable_1':[0,0,0,0,4,8],
                       'variable_2': [2, 3, 4, 5, 14, 18],
                       'variable3': [1, 1, 1, 1, 2, 2]})


df1 = dt[dt['country'].eq('b')].filter(regex='^variable_|id_resp')
print (df1)
   id_resp  variable_1  variable_2
4        1           4          14
5        2           8          18

df = (dt.merge(df1, on='id_resp', how='left', suffixes=('_',''))
        .fillna(dt)[dt.columns]
        .astype(dt.dtypes))
print (df)
   id_resp country  variable_1  variable_2  variable3
0        1       a           4          14          1
1        2       a           8          18          1
2        3       a           0           4          1
3        4       a           0           5          1
4        1       b           4          14          2
5        2       b           8          18          2

how to change the values of specific rows for specifc columns, with the values of specific rows in the same dataframe in pandas

Answers (2)

Related Questions