How to use loops & conditional statements to iterate over 2 dataframes to create a new dataframe

Question

I have 2 dataframes and I would like to use a loop to do the following.

Choose a cluster column to analyze and remove all other cluster columns in both tables

For example: If analyzing Clust 1, in df, only the index, G-pairs, G_A, G_B, and Clust 1 should remain. In df2 only Clust _1_n & clust_1_l should remain.

Go through every row of df and df2, if there is a ID match between G_A in df and clust_1_n in df2, then add a new column to df titled 'G_A_val' and enter the associated Clust_1_l Value
Go through every row of df and df2, if there is a ID match between G_B in df and clust_1_n in df2, then add a new column to df titled 'G_B_val' and enter the associated Clust_1_l Value
Add a new column to df titled sums which contains the sum of G_A_val & G_B_val for each row
Lastly, apply a condition such that if a row in the Clust 1 column is >= 0.051 then replace the sums column value for that row in df with 0

** I can perform these tasks without loops but I'm trying to learn how to use loops in dataframes.

These are the dataframes

import pandas as pd
 
# initialise data of lists.
data = {'g_pairs':['an_jk', 'tf_ha', 'ab_rh', 'et_x2','yr_po'],
        'g_a':['en762','en72b','en925','en980','en009'],
        'g_b':['en361','en231','en666','en771','en909'],
        'clust1':[0.020,1,0.05,0.7,0.001],
        'clust2':[1,1,0.1,0.0001,1],
        'clust3':[0.001,0.002,0.0021,0.3,0.005],
        'clust4':[0.0002,0.0043,0.0067,0.0123,0.0110],
        'clust5':[0.7500,0.0540,0.0030,1,0.0081]}
 
# Creates pandas DataFrame.
df = pd.DataFrame(data)
 
# print the data
df

    g_pairs g_a     g_b     clust1  clust2  clust3  clust4  clust5
0   an_jk   en762   en361   0.020   1.0000  0.0010  0.0002  0.7500
1   tf_ha   en72b   en231   1.000   1.0000  0.0020  0.0043  0.0540
2   ab_rh   en925   en666   0.050   0.1000  0.0021  0.0067  0.0030
3   et_x2   en980   en771   0.700   0.0001  0.3000  0.0123  1.0000
4   yr_po   en009   en909   0.001   1.0000  0.0050  0.0110  0.0081

data2 = {'clust_1n':['en762','en72b','en925','en980','en009','en361','en231','en666','en771','en909'],
         'clust_1l':[0.35,0.30,0.004,0.58,0.55,0.94,0.78,0.62,0.366,0.01],
         'clust_2n':['en762','en72b','en925','en980','en009','en361','en231','en666','en771','en909'],
         'clust_2l':[0.37,0.0130,0.385,0.57,0.6,0.19,0.41,0.92,0.46,0.44],
         'clust_3n':['en762','en72b','en925','en980','en009','en361','en231','en666','en771','en909'],
         'clust_3l':[0.29,0.01130,0.2385,0.457,0.16,0.149,0.411,0.992,0.146,0.944],
         'clust_4n':['en762','en72b','en925','en980','en009','en361','en231','en666','en771','en909'],
         'clust_4l':[0.529,0.0001130,0.12385,0.6457,0.816,0.9149,0.3411,0.0992,0.3146,0.0944],
         'clust_5n':['en762','en72b','en925','en980','en009','en361','en231','en666','en771','en909'],
         'clust_5l':[0.229,0.001130,0.12385,0.4457,0.616,0.1549,0.4911,0.0992,0.9146,0.2944]}
        
 
# Creates pandas DataFrame.
df2 = pd.DataFrame(data2)
 
# print the data
df2

clust_1n    clust_1l    clust_2n    clust_2l    clust_3n    clust_3l    clust_4n    clust_4l    clust_5n    clust_5l
0   en762   0.350   en762   0.370   en762   0.2900  en762   0.529000    en762   0.22900
1   en72b   0.300   en72b   0.013   en72b   0.0113  en72b   0.000113    en72b   0.00113
2   en925   0.004   en925   0.385   en925   0.2385  en925   0.123850    en925   0.12385
3   en980   0.580   en980   0.570   en980   0.4570  en980   0.645700    en980   0.44570
4   en009   0.550   en009   0.600   en009   0.1600  en009   0.816000    en009   0.61600
5   en361   0.940   en361   0.190   en361   0.1490  en361   0.914900    en361   0.15490
6   en231   0.780   en231   0.410   en231   0.4110  en231   0.341100    en231   0.49110
7   en666   0.620   en666   0.920   en666   0.9920  en666   0.099200    en666   0.09920
8   en771   0.366   en771   0.460   en771   0.1460  en771   0.314600    en771   0.91460
9   en909   0.010   en909   0.440   en909   0.9440  en909   0.094400    en909   0.29440

This is what the output should look like

   g_pairs  g_a     g_b     clust1  g_a_val g_b_val sums
0   an_jk   en762   en361   0.020   0.350   0.940   1.29
1   tf_ha   en72b   en231   1.000   0.300   0.780   0.00
2   ab_rh   en925   en666   0.050   0.004   0.620   0.97
3   et_x2   en980   en771   0.700   0.580   0.366   0.00
4   yr_po   en009   en909   0.001   0.550   0.010   0.56

Thank you very much in advance.

How to use loops & conditional statements to iterate over 2 dataframes to create a new dataframe

Answers (1)

Related Questions

How to use loops &amp; conditional statements to iterate over 2 dataframes to create a new dataframe

Answers (1)

Related Questions

How to use loops & conditional statements to iterate over 2 dataframes to create a new dataframe