Raj Rajeshwari Prasad
Raj Rajeshwari Prasad

Reputation: 334

Efficiently handling multiple For Loops and If Statements in Python

I am working with two dataframes. One containing the pair of unique keys and the other having values of those unique keys. There are almost 5.8million pairs.

Dataframe 1 - pair_df

key1 key2
a b
a c
b c
e f

Dataframe 2 - key_value_df

key value
a 432
b 654
c 874
e 014
f 421

I want a dataframe in such a way that in the dataframe 1, the respective values for both keys are present as additional columns.

Required Dataframe

key1 key2 value1 value2
a b 432 654
a c 432 874
b c 654 874
e f 014 421

I tried with the following code:

def find_vect(key_value_df, pair_df, key_name, vect_name):
    pair_df[vect_name]=''
    count=0
    for idx1, pic1 in enumerate(pair_df[key_name]):  # key_name='key1
        for idx2, pic2 in enumerate(key_value_df['key']):
            if pic1==pic2:
                vect = list(key_value_df.iloc[idx2, 0:49])
                pair_df.loc[idx1, vect_name] = vect
                count+=1
                if count%10000==0:
                    print(count)

This is a simplified version of the actual code and might contain some error. The logic is working but due to huge number of data, it is taking a lot of time for the process. Also same code has to be rerun for the other key in dataframe 1. This is making the process very time consuming.

Is there any other efficient way to solve the problem?

Upvotes: 1

Views: 63

Answers (1)

jezrael
jezrael

Reputation: 862441

Use Series.map by Series created from key_value_df by set key to index:

s = key_value_df.set_index('key')['value']

pair_df['value1'] = pair_df['key1'].map(s)
pair_df['value2'] = pair_df['key2'].map(s)

Upvotes: 1

Related Questions