Reputation: 560
Hello I am trying to create a new column in a data frame by copying values from a data frame column such that if the value of another column satisfies a condition a based on the columns of other two columns in another data frame.
To put things into more perspective, here are the two data frames
df1:
Time_Ratio time_elapsed delta_retention
Time_Ratio time_elapsed delta_retention
0.22 0.05 0.000642
0.23 0.07 0.000693
0.24 1.75 0.000740
0.25 3.75 0.000784
0.26 4.50 0.000825
0.27 7.25 0.000725
0.74 277.50 0.000593
0.75 281.25 0.000557
0.76 285.00 0.000523
0.77 288.75 0.000489
df2:
Start End Confidence
0.000000 1.416666 0.319943
0.000000 1.416666 0.424269
0.000000 1.416666 0.605917
0.000000 1.416666 0.602294
0.000000 1.416666 0.562171
0.000000 1.416666 0.790947
0.000000 1.416666 0.545534
1.433333 3.616666 0.618783
1.433333 3.616666 0.803730
1.433333 3.616666 0.385044
3.633333 7.700000 0.490057
3.633333 7.700000 0.685507
239.850000 340.716666 0.937618
239.850000 340.716666 0.829939
239.850000 340.716666 0.852264
239.850000 340.716666 0.820325
What I am trying to do is to find the cumulative delta retention between start and end for all time elapsed falling between start and end. This is what the final data frame looks like
Start End Confidence cum_delta_retention
0.000000 1.416666 0.319943 0.000642 + 0.000693
0.000000 1.416666 0.424269 0.000642 + 0.000693
0.000000 1.416666 0.605917 0.000642 + 0.000693
0.000000 1.416666 0.602294 0.000642 + 0.000693
0.000000 1.416666 0.562171 0.000642 + 0.000693
0.000000 1.416666 0.790947 0.000642 + 0.000693
0.000000 1.416666 0.545534 0.000642 + 0.000693
1.433333 3.616666 0.618783 0.000740
1.433333 3.616666 0.803730 0.000740
1.433333 3.616666 0.385044 0.000740
3.633333 7.700000 0.490057 0.000784 + 0.000825 + 0.000725
3.633333 7.700000 0.685507 0.000784 + 0.000825 + 0.000725
239.850000 340.716666 0.937618 0.000593 + 0.000557 + 0.000523 + 0.000489
239.850000 340.716666 0.829939 0.000593 + 0.000557 + 0.000523 + 0.000489
239.850000 340.716666 0.852264 0.000593 + 0.000557 + 0.000523 + 0.000489
239.850000 340.716666 0.820325 0.000593 + 0.000557 + 0.000523 + 0.000489
I was hoping if someone can please help me to figure out as to how to approach this issue. I am in the initial phase of my python learning and dont know how to approach this problem.
Thanks in advance !!
Upvotes: 1
Views: 92
Reputation: 27889
This should do it:
To get the map of desired values you can use:
df1_map = df1.set_index('time_elapsed')['delta_retention'].to_dict()
Which gives you:
{0.05: 0.000642,
0.07: 0.0006929999999999999,
1.75: 0.00074,
3.75: 0.000784,
4.5: 0.000825,
7.25: 0.000725,
277.5: 0.000593,
281.25: 0.000557,
285.0: 0.000523,
288.75: 0.000489}
Now you can create a column you want that maps the values that fall between Start
and End
:
df2['cum_delta_retention'] = df2.apply(lambda x: ' + '.join([f'{v:.6f}' for k, v in df1_map.items() if k >= x.Start and k <= x.End]), axis=1)
Which will give you:
Start End Confidence cum_delta_retention
0 0.000000 1.416666 0.319943 0.000642 + 0.000693
1 0.000000 1.416666 0.424269 0.000642 + 0.000693
2 0.000000 1.416666 0.605917 0.000642 + 0.000693
3 0.000000 1.416666 0.602294 0.000642 + 0.000693
4 0.000000 1.416666 0.562171 0.000642 + 0.000693
5 0.000000 1.416666 0.790947 0.000642 + 0.000693
6 0.000000 1.416666 0.545534 0.000642 + 0.000693
7 1.433333 3.616666 0.618783 0.000740
8 1.433333 3.616666 0.803730 0.000740
9 1.433333 3.616666 0.385044 0.000740
10 3.633333 7.700000 0.490057 0.000784 + 0.000825 + 0.000725
11 3.633333 7.700000 0.685507 0.000784 + 0.000825 + 0.000725
12 239.850000 340.716666 0.937618 0.000593 + 0.000557 + 0.000523 + 0.000489
13 239.850000 340.716666 0.829939 0.000593 + 0.000557 + 0.000523 + 0.000489
14 239.850000 340.716666 0.852264 0.000593 + 0.000557 + 0.000523 + 0.000489
15 239.850000 340.716666 0.820325 0.000593 + 0.000557 + 0.000523 + 0.000489
Upvotes: 1