Reputation: 334
I am working with two dataframes. One containing the pair of unique keys and the other having values of those unique keys. There are almost 5.8million pairs.
Dataframe 1 - pair_df
key1 | key2 |
---|---|
a | b |
a | c |
b | c |
e | f |
Dataframe 2 - key_value_df
key | value |
---|---|
a | 432 |
b | 654 |
c | 874 |
e | 014 |
f | 421 |
I want a dataframe in such a way that in the dataframe 1, the respective values for both keys are present as additional columns.
Required Dataframe
key1 | key2 | value1 | value2 |
---|---|---|---|
a | b | 432 | 654 |
a | c | 432 | 874 |
b | c | 654 | 874 |
e | f | 014 | 421 |
I tried with the following code:
def find_vect(key_value_df, pair_df, key_name, vect_name):
pair_df[vect_name]=''
count=0
for idx1, pic1 in enumerate(pair_df[key_name]): # key_name='key1
for idx2, pic2 in enumerate(key_value_df['key']):
if pic1==pic2:
vect = list(key_value_df.iloc[idx2, 0:49])
pair_df.loc[idx1, vect_name] = vect
count+=1
if count%10000==0:
print(count)
This is a simplified version of the actual code and might contain some error. The logic is working but due to huge number of data, it is taking a lot of time for the process. Also same code has to be rerun for the other key in dataframe 1. This is making the process very time consuming.
Is there any other efficient way to solve the problem?
Upvotes: 1
Views: 63
Reputation: 862441
Use Series.map
by Series
created from key_value_df
by set key
to index
:
s = key_value_df.set_index('key')['value']
pair_df['value1'] = pair_df['key1'].map(s)
pair_df['value2'] = pair_df['key2'].map(s)
Upvotes: 1