Jjang
Jjang

Reputation: 11444

Pandas replacing values in a column by values in another column

Let's say I have the following dataframe X (ppid is unique):

    ppid  col2 ...
1   'id1'  '1'
2   'id2'  '2'
3   'id3'  '3'
...

I have another dataframe which serves as a mapping. ppid is same as above and unique, however it might not contain all X's ppids:

    ppid  val
1   'id1' '5'
2   'id2' '6'

I would like to use the mapping dataframe to switch col2 in dataframe X according to where the ppids are equal (in reality, they're multiple columns which are unique together), to get:

    ppid  col2 ...
1   'id1'  '5'
2   'id2'  '6'
3   'id3'  '3' # didn't change, as there's no match
...

Upvotes: 1

Views: 94

Answers (5)

Corralien
Corralien

Reputation: 120409

Input data:

>>> dfX
    ppid col1 col2 col3
0  'id1'  'X'  '5'  'A'
1  'id2'  'Y'  '6'  'B'
2  'id3'  'Z'  '3'  'C'

>>> dfM
    ppid  val
0  'id1'  '5'
1  'id2'  '6'

dfX is your first dataframe and dfM is your mapping dataframe:

>>> dfM.rename(columns={'val': 'col2'}).combine_first(dfX).loc[:, df.columns]

    ppid col1 col2 col3
0  'id1'  'X'  '5'  'A'
1  'id2'  'Y'  '6'  'B'
2  'id3'  'Z'  '3'  'C'

Upvotes: 2

takimas
takimas

Reputation: 36

Have a look at Jeremy Z answer on this post, for further explanation on solution https://stackoverflow.com/a/55631906/16235276

df1 = df1.set_index('ppid')
df2 = df2.set_index('ppid')
df1.update(df2)
df1.reset_index(inplace=True)

Upvotes: 2

caring-goat-913
caring-goat-913

Reputation: 4049

df1 = pd.DataFrame({'ppid': ['id1', 'id2', 'id3'], 'col2': ['1', '2', '3']})
df2 = pd.DataFrame({'ppid': ['id1', 'id2'], 'col2': ['5', '6']})

merged = df1.merge(df2, how='left', on='ppid')
merged['col2_y'].fillna(merged['col2_x'], inplace=True)

merged
  ppid col2_x col2_y
0  id1      1      5
1  id2      2      6
2  id3      3      3

Upvotes: 0

tomtomfox
tomtomfox

Reputation: 304

First merge your dataframes then use pd.Series.combine_first

df1 = pd.merge(df1, df2, how='left', on='ppid')
df1['col2'] = df1.val.combine_first(df1.col2)
del df1['val']

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153460

Try using map with set_index:

df_x = pd.DataFrame({'ppid':['id1','id2','id3'], 'col2':[*'123']})

df_a = pd.DataFrame({'ppid':['id1','id2'], 'val':[*'56']})

df_x['col2'] = df_x['ppid'].map(df_a.set_index('ppid')['val']).fillna(df_x['col2'])

Output:

  ppid col2
0  id1    5
1  id2    6
2  id3    3

Upvotes: 2

Related Questions