raam
raam

Reputation: 241

How to compare two Dataframes on a column and replace with other column value

I have two input dataframes, df1 and df2:

id      first       last  size
  A 1978-01-01 1979-01-01     2
  B 2000-01-01 2000-01-01     1
  C 1998-01-01 2000-01-01     3
  D 1998-01-01 1998-01-01     1
  E 1999-01-01 2000-01-01     2

  id  token       
  A     ZA.00 
  B     As.11
  C     SD.34

My desired output:

id          first       last        size
  ZA.00     1978-01-01 1979-01-01     2
  As.11     2000-01-01 2000-01-01     1
  SD.34     1998-01-01 2000-01-01     3
  D         1998-01-01 1998-01-01     1
  E         1999-01-01 2000-01-01     2

If df2['id'] matches df1['id'] then replace df1['id'] with df2['token']. How can I achieve this?

Upvotes: 2

Views: 1949

Answers (3)

Scott Boston
Scott Boston

Reputation: 153460

Use map and fillna:

df1['id'] = df1['id'].map(df2.set_index('id')['token']).fillna(df1['id'])
df1

Output:

      id       first        last  size
0  ZA.00  1978-01-01  1979-01-01     2
1  As.11  2000-01-01  2000-01-01     1
2  SD.34  1998-01-01  2000-01-01     3
3      D  1998-01-01  1998-01-01     1
4      E  1999-01-01  2000-01-01     2

You can use map with a series as an argument.

Upvotes: 2

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15568

If you do not wish to merge your DataFrame, you could use apply function to solve this. Change your small DataFrame to dictionary and map it to the other DataFrame.

from io import StringIO #used to get string to df

import pandas as pd

id_ =list('ABC')
token = 'ZA.00 As.11 SD.34'.split()
dt = pd.DataFrame(list(zip(id_,token)),columns=['id','token'])

a ='''
id first last size
A 1978-01-01 1979-01-01 2
B 2000-01-01 2000-01-01 1
C 1998-01-01 2000-01-01 3
D 1998-01-01 1998-01-01 1
E 1999-01-01 2000-01-01 2
'''

df =pd.read_csv(StringIO(a), sep=' ')

# This last two lines are all you need
mp= {x:y for x,y in zip(dt.id.tolist(),dt.token.tolist())}

df.id.apply(lambda x: mp[x] if x in mp.keys() else x)

Upvotes: 0

Space Impact
Space Impact

Reputation: 13255

Using Merge and combine_first:

df = df1.merge(df2,how='outer')
df['id'] = df['token'].combine_first(df['id'] )
df.drop('token',inplace=True,axis=1)

Another way is to use replace with dictionary of df2.values, here the df1 dataframe changes.:

df1.id.replace(dict(df2.values),inplace=True)

        id  first   last    size
    0   ZA.00   1978-01-01  1979-01-01  2
    1   As.11   2000-01-01  2000-01-01  1
    2   SD.34   1998-01-01  2000-01-01  3
    3   D   1998-01-01  1998-01-01  1
    4   E   1999-01-01  2000-01-01  2

Upvotes: 1

Related Questions