Efficiently handling multiple For Loops and If Statements in Python

Question

I am working with two dataframes. One containing the pair of unique keys and the other having values of those unique keys. There are almost 5.8million pairs.

Dataframe 1 - pair_df

key1	key2
a	b
a	c
b	c
e	f

Dataframe 2 - key_value_df

key	value
a	432
b	654
c	874
e	014
f	421

I want a dataframe in such a way that in the dataframe 1, the respective values for both keys are present as additional columns.

Required Dataframe

key1	key2	value1	value2
a	b	432	654
a	c	432	874
b	c	654	874
e	f	014	421

I tried with the following code:

def find_vect(key_value_df, pair_df, key_name, vect_name):
    pair_df[vect_name]=''
    count=0
    for idx1, pic1 in enumerate(pair_df[key_name]):  # key_name='key1
        for idx2, pic2 in enumerate(key_value_df['key']):
            if pic1==pic2:
                vect = list(key_value_df.iloc[idx2, 0:49])
                pair_df.loc[idx1, vect_name] = vect
                count+=1
                if count%10000==0:
                    print(count)

This is a simplified version of the actual code and might contain some error. The logic is working but due to huge number of data, it is taking a lot of time for the process. Also same code has to be rerun for the other key in dataframe 1. This is making the process very time consuming.

Is there any other efficient way to solve the problem?

jezrael · Accepted Answer

Use Series.map by Series created from key_value_df by set key to index:

s = key_value_df.set_index('key')['value']

pair_df['value1'] = pair_df['key1'].map(s)
pair_df['value2'] = pair_df['key2'].map(s)

Efficiently handling multiple For Loops and If Statements in Python

Answers (1)

Related Questions