Reputation: 37
I have a dataframe, say df1 like this:
1 2 3 4
1 5 2 3 14
2 20 18 12 16
3 13 21 32 4
4 17 25 32 13
and another dataframe, say df2 like this:
1 2 3 4
1 13 19 45 56
2 45 54 28 31
3 33 45 32 9
4 23 65 14 15
I want to fill this third dataframe, say df3:
Lower Upper Val
0 5 73
5 10 13
10 15 132
15 20 ...
20 25 ...
25 30 ...
Now, to fill the 'val' column in df3, the code should first get the cell locations in df1 which lies between the given limits in df3 and then adds the corresponding values of df2 not df1.
I could able to do it using nested-loops but I need something without loops.
Thank you in advance.
Upvotes: 1
Views: 277
Reputation: 862761
Idea is create IntervalIndex.from_arrays
from Lower
and Upper
columns. Then reshape first and second DataFrames by DataFrame.stack
and for first use binning by cut
used for groups for aggregation by sum
:
df3.index = pd.IntervalIndex.from_arrays(df3['Lower'], df3['Upper'], closed='left')
print (df3.index)
IntervalIndex([[0, 5), [5, 10), [10, 15), [15, 20), [20, 25), [25, 30)],
closed='left',
dtype='interval[int64]')
df3['Val'] = df2.stack().groupby(pd.cut(df1.stack(), df3.index)).sum()
print (df3)
Lower Upper Val
[0, 5) 0 5 73
[5, 10) 5 10 13
[10, 15) 10 15 132
[15, 20) 15 20 108
[20, 25) 20 25 90
[25, 30) 25 30 65
Last create default index:
df3 = df3.reset_index(drop=True)
print (df3)
Lower Upper Val
0 0 5 73
1 5 10 13
2 10 15 132
3 15 20 108
4 20 25 90
5 25 30 65
Details:
print (df1.stack())
1 1 5
2 2
3 3
4 14
2 1 20
2 18
3 12
4 16
3 1 13
2 21
3 32
4 4
4 1 17
2 25
3 32
4 13
dtype: int64
print (pd.cut(df1.stack(), df3.index))
1 1 [5.0, 10.0)
2 [0.0, 5.0)
3 [0.0, 5.0)
4 [10.0, 15.0)
2 1 [20.0, 25.0)
2 [15.0, 20.0)
3 [10.0, 15.0)
4 [15.0, 20.0)
3 1 [10.0, 15.0)
2 [20.0, 25.0)
3 NaN
4 [0.0, 5.0)
4 1 [15.0, 20.0)
2 [25.0, 30.0)
3 NaN
4 [10.0, 15.0)
dtype: category
Categories (6, interval[int64]): [[0, 5) < [5, 10) < [10, 15) < [15, 20) < [20, 25) < [25, 30)]
print (df2.stack())
1 1 13
2 19
3 45
4 56
2 1 45
2 54
3 28
4 31
3 1 33
2 45
3 32
4 9
4 1 23
2 65
3 14
4 15
dtype: int64
Upvotes: 1