Reputation: 255
I'm quite new to python (and pandas) and a have a replace task for a large dataframe i couldn't find a solution for.
So i have two dataframes, one (df1
) which looks something like this:
Id Id Id
4954733 3929949 515674
2950086 1863885 4269069
1241018 3711213 4507609
3806276 2035233 4968071
4437138 1248817 1167192
5468160 4726010 2851685
1211786 2604463 5172095
2914539 5235788 4130808
4730974 5835757 1536235
2201352 5779683 5771612
3864854 4784259 2928288
the other dataframe (df2
) containing all the 'old' id's and the corresponding new ones in the next column (from 1 to 20,000), which looks something like this:
Id Id_new
5774290 1
761000 2
3489755 3
1084156 4
2188433 5
3456900 6
4364416 7
3518181 8
3926684 9
5797492 10
4435820 11
what i would like to do is replace all the id's (all columns) in df1
with the corresponding Id_new
from df2
. I guess ideally without having to do a merge
or join
for each column, given the size of the dataset?
The result should be like this: df_new
Id_new Id_new Id_new
8 12 22
16 9 8
21 25 10
10 15 13
29 6 4
22 7 22
30 3 3
11 31 29
32 29 27
12 3 4
14 6 24
Any tips would be great, thanks in advance!
Upvotes: 1
Views: 276
Reputation: 862681
I think you need replace
by Series
created by set_index
:
print (df1)
Id Id.1 Id.2
0 4954733 3929949 515674 <-first value changed for match data
1 2950086 1863885 4269069
2 1241018 3711213 4507609
3 3806276 2035233 4968071
4 4437138 1248817 1167192
5 5468160 4726010 2851685
6 1211786 2604463 5172095
7 2914539 5235788 4130808
8 4730974 5835757 1536235
9 2201352 5779683 5771612
10 3864854 4784259 2928288
df = df1.replace(df2.set_index('Id')['Id_new'])
print (df)
Id Id.1 Id.2
0 1 3929949 515674
1 2950086 1863885 4269069
2 1241018 3711213 4507609
3 3806276 2035233 4968071
4 4437138 1248817 1167192
5 5468160 4726010 2851685
6 1211786 2604463 5172095
7 2914539 5235788 4130808
8 4730974 5835757 1536235
9 2201352 5779683 5771612
10 3864854 4784259 2928288
Upvotes: 3