Create a column by comparing two pandas dataframes

Question

Hello I am trying to create a new column in a data frame by copying values from a data frame column such that if the value of another column satisfies a condition a based on the columns of other two columns in another data frame.

To put things into more perspective, here are the two data frames

df1:
Time_Ratio time_elapsed  delta_retention

Time_Ratio time_elapsed  delta_retention

0.22        0.05          0.000642
0.23        0.07          0.000693
0.24        1.75          0.000740
0.25        3.75          0.000784
0.26        4.50          0.000825
0.27       7.25           0.000725
0.74       277.50         0.000593
0.75       281.25         0.000557
0.76       285.00         0.000523
0.77       288.75         0.000489

df2:
Start         End         Confidence
0.000000    1.416666      0.319943
0.000000    1.416666      0.424269
0.000000    1.416666      0.605917
0.000000    1.416666      0.602294
0.000000    1.416666      0.562171
0.000000    1.416666      0.790947
0.000000    1.416666      0.545534
1.433333    3.616666      0.618783
1.433333    3.616666      0.803730
1.433333    3.616666      0.385044
3.633333    7.700000      0.490057
3.633333    7.700000      0.685507
239.850000  340.716666    0.937618
239.850000  340.716666    0.829939
239.850000  340.716666    0.852264
239.850000  340.716666    0.820325

What I am trying to do is to find the cumulative delta retention between start and end for all time elapsed falling between start and end. This is what the final data frame looks like

Start         End         Confidence     cum_delta_retention
0.000000    1.416666      0.319943      0.000642 + 0.000693
0.000000    1.416666      0.424269      0.000642 + 0.000693
0.000000    1.416666      0.605917      0.000642 + 0.000693
0.000000    1.416666      0.602294      0.000642 + 0.000693
0.000000    1.416666      0.562171      0.000642 + 0.000693
0.000000    1.416666      0.790947      0.000642 + 0.000693
0.000000    1.416666      0.545534      0.000642 + 0.000693
1.433333    3.616666      0.618783      0.000740
1.433333    3.616666      0.803730      0.000740
1.433333    3.616666      0.385044      0.000740
3.633333    7.700000      0.490057      0.000784 + 0.000825 + 0.000725 
3.633333    7.700000      0.685507      0.000784 + 0.000825 + 0.000725 
239.850000  340.716666    0.937618      0.000593 + 0.000557 + 0.000523 + 0.000489
239.850000  340.716666    0.829939      0.000593 + 0.000557 + 0.000523 + 0.000489
239.850000  340.716666    0.852264      0.000593 + 0.000557 + 0.000523 + 0.000489
239.850000  340.716666    0.820325      0.000593 + 0.000557 + 0.000523 + 0.000489

I was hoping if someone can please help me to figure out as to how to approach this issue. I am in the initial phase of my python learning and dont know how to approach this problem.

Thanks in advance !!

zipa · Accepted Answer

This should do it:

To get the map of desired values you can use:

df1_map = df1.set_index('time_elapsed')['delta_retention'].to_dict()

Which gives you:

{0.05: 0.000642,
 0.07: 0.0006929999999999999,
 1.75: 0.00074,
 3.75: 0.000784,
 4.5: 0.000825,
 7.25: 0.000725,
 277.5: 0.000593,
 281.25: 0.000557,
 285.0: 0.000523,
 288.75: 0.000489}

Now you can create a column you want that maps the values that fall between Start and End:

df2['cum_delta_retention'] = df2.apply(lambda x: ' + '.join([f'{v:.6f}' for k, v in df1_map.items() if k >= x.Start and k <= x.End]), axis=1)

Which will give you:

         Start         End  Confidence                        cum_delta_retention
0     0.000000    1.416666    0.319943                        0.000642 + 0.000693
1     0.000000    1.416666    0.424269                        0.000642 + 0.000693
2     0.000000    1.416666    0.605917                        0.000642 + 0.000693
3     0.000000    1.416666    0.602294                        0.000642 + 0.000693
4     0.000000    1.416666    0.562171                        0.000642 + 0.000693
5     0.000000    1.416666    0.790947                        0.000642 + 0.000693
6     0.000000    1.416666    0.545534                        0.000642 + 0.000693
7     1.433333    3.616666    0.618783                                   0.000740
8     1.433333    3.616666    0.803730                                   0.000740
9     1.433333    3.616666    0.385044                                   0.000740
10    3.633333    7.700000    0.490057             0.000784 + 0.000825 + 0.000725
11    3.633333    7.700000    0.685507             0.000784 + 0.000825 + 0.000725
12  239.850000  340.716666    0.937618  0.000593 + 0.000557 + 0.000523 + 0.000489
13  239.850000  340.716666    0.829939  0.000593 + 0.000557 + 0.000523 + 0.000489
14  239.850000  340.716666    0.852264  0.000593 + 0.000557 + 0.000523 + 0.000489
15  239.850000  340.716666    0.820325  0.000593 + 0.000557 + 0.000523 + 0.000489

Create a column by comparing two pandas dataframes

Answers (1)

Related Questions