Reputation: 25
So i am working with a dataset with two data frames. The Data Frames look like this:
df1:
Item_ID Item_Name
0 A
1 B
2 C
df2:
Item_slot_1 Item_slot_2 Item_Slot_3
2 2 1
1 2 0
0 1 1
The values in df2 represent the Item_ID from df1. How can i replace the values in df2 from the item_id to the actual item name so that df2 can look like:
Item_slot_1 Item_slot_2 Item_Slot_3
C C B
B C A
A B B
The data set in reality is much larger and has way more id's and names than just a,b and c
Upvotes: 1
Views: 89
Reputation: 51165
You can improve the speed of dictionary mapping with numpy
. If your items are numbered 0-N
this is trivial, if they are not, it gets a bit more tricky, but is still easily doable.
If the items in df1
are numbered 0-N
, use basic indexing:
a = df1['Item_Name'].values
b = df2.values
pd.DataFrame(a[b], columns=df2.columns)
Item_slot_1 Item_slot_2 Item_Slot_3
0 C C B
1 B C A
2 A B B
If they are not numbered 0-N
, here is a more general approach:
x = df1['Item_ID'].values
y = df1['Item_Name'].values
z = df2.values
m = np.arange(x.max() + 1, dtype=object)
m[x] = y
pd.DataFrame(m[z], columns=df2.columns)
Item_slot_1 Item_slot_2 Item_Slot_3
0 C C B
1 B C A
2 A B B
To only replace a subset of columns from df2
is also simple, let's demonstrate only replacing the first two columns of df2
:
x = df1['Item_ID'].values
y = df1['Item_Name'].values
cols = ['Item_slot_1', 'Item_slot_2']
z = df2[cols].values
m = np.arange(x.max() + 1, dtype=object)
m[x] = y
df2[cols] = m[z]
Item_slot_1 Item_slot_2 Item_Slot_3
0 C C 1
1 B C 0
2 A B 1
This type of indexing nets a hefty performance gain over apply
and replace
:
import string
df1 = pd.DataFrame({'Item_ID': np.arange(26), 'Item_Name': list(string.ascii_uppercase)})
df2 = pd.DataFrame(np.random.randint(1, 26, (10000, 100)))
%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.applymap(s.get)
158 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.replace(s)
750 ms ± 34.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.apply(lambda x: x.map(s))
93.1 ms ± 4.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
x = df1['Item_ID'].values
y = df1['Item_Name'].values
z = df2.values
m = np.arange(x.max() + 1, dtype=object)
m[x] = y
pd.DataFrame(m[z], columns=df2.columns)
30.4 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 0
Reputation: 862471
Create dictionary by zip
and pass it to applymap
, or replace
or apply
with map
:
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
#if value not exist in df1['Item_ID'] get None in df2
df2 = df2.applymap(s.get)
Or:
#if value not exist in df1['Item_ID'] get original value in df2
df2 = df2.replace(s)
Or:
#if value not exist in df1['Item_ID'] get NaN in df2
df2 = df2.apply(lambda x: x.map(s))
print (df2)
Item_slot_1 Item_slot_2 Item_Slot_3
0 C C B
1 B C A
2 A B B
EDIT:
You can specified columns by names for process:
cols = ['Item_slot_1','Item_slot_2','Item_Slot_3']
df2[cols] = df2[cols].applymap(s.get)
df2[cols] = df2[cols].replace(s)
df2[cols] = df2[cols].apply(lambda x: x.map(s))
Upvotes: 1