Louis
Louis

Reputation: 25

Merge two dataframes on different named columns for multiple columns

I have two dataframes: Users and Item_map.

Users consists of user and fake_item_ids stored in three columns.

Item_map consists of real_item_ids and fake_item_ids.

What I want is to replace all of the the fake_item_ids with the real_item_ids.

To illustrate with dummy code:

DataFrame Users

   user  fake_0  fake_1
0     1    6786    3938
1     2    6786    6786
2     3    4345    4345
3     4    7987    3938
4     5    7987    5464
DataFrame Item_map

   real_id  fake_id
0      101     7987
1      202     6786
2      303     5464
3      404     4345
4      505     3938

Expected results:

DataFrame Users
   user  real_0  real_1
0     1    202     505
1     2    202     202
2     3    404     404
3     4    101     505
4     5    101     303

I have tried the following, based on an answer found here: how to concat two data frames with different column names in pandas? - python

users['fake_0'] = users.merge(items.rename(columns={'fake_id': 'fake_0'}), how='inner')['real_id']

which resulted in this:

   user  fake_0  fake_1
0     1     202    3938
1     2     202    6786
2     3     404    4345
3     4     101    3938
4     5     101    5464

This works, but it seems silly to do so for every column separately (I have nine columns that have fake_ids that need to be real_ids).

Any help is much appreciated!

Dummy code:

  users = pd.DataFrame({
    'user': [1, 2, 3, 4, 5], 
    'fake_0': [6786, 6786, 4345, 7987, 7987],
    'fake_1': [3938, 6786, 4345, 3938, 5464]

    })
  item_map = pd.DataFrame({
    'real_id': [101, 202, 303, 404, 505],
    'fake_id': [7987, 6786, 5464, 4345, 3938]
    })

Upvotes: 1

Views: 159

Answers (2)

BENY
BENY

Reputation: 323376

We using replace

df.replace(dict(zip(df1.fake_id,df1.real_id)))
Out[46]: 
   user  fake_0  fake_1
0     1     202     505
1     2     202     202
2     3     404     404
3     4     101     505
4     5     101     303

Upvotes: 1

sjw
sjw

Reputation: 6543

I'm not sure if this will be the most efficient solution, but it should work for your example with 10 columns without you having to edit anything.

First, create a lookup dictionary from your item_map:

d = pd.Series(index=item_map['fake_id'], data=item_map['real_id'].values).to_dict()

Then, use applymap to look up each column except 'user':

results = users.set_index('user').applymap(lambda x: d[x]).reset_index()

If you want, you can then rename the columns to get your desired output:

results.columns = [col.replace('fake', 'real') for col in results.columns]

Results:

   user  real_0  real_1
0     1     202     505
1     2     202     202
2     3     404     404
3     4     101     505
4     5     101     303

Upvotes: 0

Related Questions