Reputation: 25
I have two dataframes: Users and Item_map
.
Users
consists of user and fake_item_ids
stored in three columns.
Item_map
consists of real_item_ids and fake_item_ids
.
What I want is to replace all of the the fake_item_ids
with the real_item_ids
.
To illustrate with dummy code:
DataFrame Users
user fake_0 fake_1
0 1 6786 3938
1 2 6786 6786
2 3 4345 4345
3 4 7987 3938
4 5 7987 5464
DataFrame Item_map
real_id fake_id
0 101 7987
1 202 6786
2 303 5464
3 404 4345
4 505 3938
Expected results:
DataFrame Users
user real_0 real_1
0 1 202 505
1 2 202 202
2 3 404 404
3 4 101 505
4 5 101 303
I have tried the following, based on an answer found here: how to concat two data frames with different column names in pandas? - python
users['fake_0'] = users.merge(items.rename(columns={'fake_id': 'fake_0'}), how='inner')['real_id']
which resulted in this:
user fake_0 fake_1
0 1 202 3938
1 2 202 6786
2 3 404 4345
3 4 101 3938
4 5 101 5464
This works, but it seems silly to do so for every column separately (I have nine columns that have fake_ids that need to be real_ids).
Any help is much appreciated!
Dummy code:
users = pd.DataFrame({
'user': [1, 2, 3, 4, 5],
'fake_0': [6786, 6786, 4345, 7987, 7987],
'fake_1': [3938, 6786, 4345, 3938, 5464]
})
item_map = pd.DataFrame({
'real_id': [101, 202, 303, 404, 505],
'fake_id': [7987, 6786, 5464, 4345, 3938]
})
Upvotes: 1
Views: 159
Reputation: 323376
We using replace
df.replace(dict(zip(df1.fake_id,df1.real_id)))
Out[46]:
user fake_0 fake_1
0 1 202 505
1 2 202 202
2 3 404 404
3 4 101 505
4 5 101 303
Upvotes: 1
Reputation: 6543
I'm not sure if this will be the most efficient solution, but it should work for your example with 10 columns without you having to edit anything.
First, create a lookup dictionary from your item_map:
d = pd.Series(index=item_map['fake_id'], data=item_map['real_id'].values).to_dict()
Then, use applymap
to look up each column except 'user':
results = users.set_index('user').applymap(lambda x: d[x]).reset_index()
If you want, you can then rename the columns to get your desired output:
results.columns = [col.replace('fake', 'real') for col in results.columns]
Results:
user real_0 real_1
0 1 202 505
1 2 202 202
2 3 404 404
3 4 101 505
4 5 101 303
Upvotes: 0