quant
quant

Reputation: 4482

how to change the values of specific rows for specifc columns, with the values of specific rows in the same dataframe in pandas

I have the following dataframe

import pandas as pd


       dt = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
                       'country':['a','a','a','a','b','b'],
                       'variable_1':[0,0,0,0,4,8],
                       'variable_2': [2, 3, 4, 5, 14, 18],
                       'variable3': [1, 1, 1, 1, 2, 2]})

I would like to change the values of the variable_1 and variable_2 for the id_resps in country a that are also in country b, with the values of the variable_1 and variable_2 respectively of the id_resps that are in country b

My resulting dataframe should like

  dt_f = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
                           'country':['a','a','a','a','b','b'],
                           'variable_1':[4,8,0,0,4,8],
                           'variable_2': [14, 18, 4, 5, 14, 18],
                           'variable3': [1, 1, 1, 1, 2, 2]})

Is there a pythonic way of doing that (also selection the variable_1 and variable_2 using a regex) ?

Upvotes: 2

Views: 34

Answers (2)

jezrael
jezrael

Reputation: 862511

Use DataFrame.merge with left join ony be filtered rows, then replace missing values by original by DataFrame.fillna, get columns by original data and last convertm types by original by DataFrame.astype with DataFrame.dtypes:

df1 = dt[dt['country'].eq('b')]
df = (dt.merge(df1, on='id_resp', how='left', suffixes=('_',''))
        .fillna(dt)[dt.columns]
        .astype(dt.dtypes))
print (df)
   id_resp country  variable_1  variable_2
0        1       b           4          14
1        2       b           8          18
2        3       a           0           4
3        4       a           0           5
4        1       b           4          14
5        2       b           8          18

EDIT: If want filter only some columns with regex one possible solution with DataFrame.filter, also is necessary add id_resp with selected columns:

dt = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
                       'country':['a','a','a','a','b','b'],
                       'variable_1':[0,0,0,0,4,8],
                       'variable_2': [2, 3, 4, 5, 14, 18],
                       'variable3': [1, 1, 1, 1, 2, 2]})


df1 = dt[dt['country'].eq('b')].filter(regex='^variable_|id_resp')
print (df1)
   id_resp  variable_1  variable_2
4        1           4          14
5        2           8          18

df = (dt.merge(df1, on='id_resp', how='left', suffixes=('_',''))
        .fillna(dt)[dt.columns]
        .astype(dt.dtypes))
print (df)
   id_resp country  variable_1  variable_2  variable3
0        1       a           4          14          1
1        2       a           8          18          1
2        3       a           0           4          1
3        4       a           0           5          1
4        1       b           4          14          2
5        2       b           8          18          2

Upvotes: 2

anky
anky

Reputation: 75080

Here is another method using unstack and np.where

dt = dt.set_index(['id_resp','country'])
m = dt.unstack().swaplevel(axis=1)
c = m.loc[:,'b'].notna().any(1)
m.loc[:,'a'] = np.where(c[:,None],m.loc[:,'b'],m.loc[:,'a'])

dt = m.stack(0).reindex(dt.index).reset_index()
print(dt)
   id_resp country  variable_1  variable_2
0        1       a         4.0        14.0
1        2       a         8.0        18.0
2        3       a         0.0         4.0
3        4       a         0.0         5.0
4        1       b         4.0        14.0
5        2       b         8.0        18.0

Upvotes: 1

Related Questions