Reputation: 4482
I have the following dataframe
import pandas as pd
dt = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
'country':['a','a','a','a','b','b'],
'variable_1':[0,0,0,0,4,8],
'variable_2': [2, 3, 4, 5, 14, 18],
'variable3': [1, 1, 1, 1, 2, 2]})
I would like to change the values of the variable_1
and variable_2
for the id_resp
s in country
a
that are also in country
b
, with the values of the variable_1
and variable_2
respectively of the id_resp
s that are in country b
My resulting dataframe should like
dt_f = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
'country':['a','a','a','a','b','b'],
'variable_1':[4,8,0,0,4,8],
'variable_2': [14, 18, 4, 5, 14, 18],
'variable3': [1, 1, 1, 1, 2, 2]})
Is there a pythonic way of doing that (also selection the variable_1
and variable_2
using a regex
) ?
Upvotes: 2
Views: 34
Reputation: 862511
Use DataFrame.merge
with left join ony be filtered rows, then replace missing values by original by DataFrame.fillna
, get columns by original data and last convertm types by original by DataFrame.astype
with DataFrame.dtypes
:
df1 = dt[dt['country'].eq('b')]
df = (dt.merge(df1, on='id_resp', how='left', suffixes=('_',''))
.fillna(dt)[dt.columns]
.astype(dt.dtypes))
print (df)
id_resp country variable_1 variable_2
0 1 b 4 14
1 2 b 8 18
2 3 a 0 4
3 4 a 0 5
4 1 b 4 14
5 2 b 8 18
EDIT: If want filter only some columns with regex one possible solution with DataFrame.filter
, also is necessary add id_resp
with selected columns:
dt = pd.DataFrame({'id_resp':[1,2,3,4,1,2],
'country':['a','a','a','a','b','b'],
'variable_1':[0,0,0,0,4,8],
'variable_2': [2, 3, 4, 5, 14, 18],
'variable3': [1, 1, 1, 1, 2, 2]})
df1 = dt[dt['country'].eq('b')].filter(regex='^variable_|id_resp')
print (df1)
id_resp variable_1 variable_2
4 1 4 14
5 2 8 18
df = (dt.merge(df1, on='id_resp', how='left', suffixes=('_',''))
.fillna(dt)[dt.columns]
.astype(dt.dtypes))
print (df)
id_resp country variable_1 variable_2 variable3
0 1 a 4 14 1
1 2 a 8 18 1
2 3 a 0 4 1
3 4 a 0 5 1
4 1 b 4 14 2
5 2 b 8 18 2
Upvotes: 2
Reputation: 75080
Here is another method using unstack
and np.where
dt = dt.set_index(['id_resp','country'])
m = dt.unstack().swaplevel(axis=1)
c = m.loc[:,'b'].notna().any(1)
m.loc[:,'a'] = np.where(c[:,None],m.loc[:,'b'],m.loc[:,'a'])
dt = m.stack(0).reindex(dt.index).reset_index()
print(dt)
id_resp country variable_1 variable_2
0 1 a 4.0 14.0
1 2 a 8.0 18.0
2 3 a 0.0 4.0
3 4 a 0.0 5.0
4 1 b 4.0 14.0
5 2 b 8.0 18.0
Upvotes: 1