MWK
MWK

Reputation: 37

How to group data of one dataframe by matching the condition on another dataframe, when both the dataframes has exactly same columns and index?

I have a dataframe, say df1 like this:

    1   2   3   4   
1   5   2   3   14  
2   20  18  12  16  
3   13  21  32  4  
4   17  25  32  13 

and another dataframe, say df2 like this:

    1   2   3   4 
1   13  19  45  56 
2   45  54  28  31
3   33  45  32  9
4   23  65  14  15

I want to fill this third dataframe, say df3:

Lower   Upper   Val
0       5       73
5       10      13
10      15      132
15      20      ...
20      25      ...
25      30      ...

Now, to fill the 'val' column in df3, the code should first get the cell locations in df1 which lies between the given limits in df3 and then adds the corresponding values of df2 not df1.

I could able to do it using nested-loops but I need something without loops.

Thank you in advance.

Upvotes: 1

Views: 277

Answers (1)

jezrael
jezrael

Reputation: 862761

Idea is create IntervalIndex.from_arrays from Lower and Upper columns. Then reshape first and second DataFrames by DataFrame.stack and for first use binning by cut used for groups for aggregation by sum:

df3.index = pd.IntervalIndex.from_arrays(df3['Lower'], df3['Upper'], closed='left')
print (df3.index)
IntervalIndex([[0, 5), [5, 10), [10, 15), [15, 20), [20, 25), [25, 30)],
              closed='left',
              dtype='interval[int64]')

df3['Val'] = df2.stack().groupby(pd.cut(df1.stack(), df3.index)).sum()
print (df3)
          Lower  Upper  Val
[0, 5)        0      5   73
[5, 10)       5     10   13
[10, 15)     10     15  132
[15, 20)     15     20  108
[20, 25)     20     25   90
[25, 30)     25     30   65

Last create default index:

df3 = df3.reset_index(drop=True)
print (df3)
   Lower  Upper  Val
0      0      5   73
1      5     10   13
2     10     15  132
3     15     20  108
4     20     25   90
5     25     30   65

Details:

print (df1.stack())
1  1     5
   2     2
   3     3
   4    14
2  1    20
   2    18
   3    12
   4    16
3  1    13
   2    21
   3    32
   4     4
4  1    17
   2    25
   3    32
   4    13
dtype: int64

print (pd.cut(df1.stack(), df3.index))
1  1     [5.0, 10.0)
   2      [0.0, 5.0)
   3      [0.0, 5.0)
   4    [10.0, 15.0)
2  1    [20.0, 25.0)
   2    [15.0, 20.0)
   3    [10.0, 15.0)
   4    [15.0, 20.0)
3  1    [10.0, 15.0)
   2    [20.0, 25.0)
   3             NaN
   4      [0.0, 5.0)
4  1    [15.0, 20.0)
   2    [25.0, 30.0)
   3             NaN
   4    [10.0, 15.0)
dtype: category
Categories (6, interval[int64]): [[0, 5) < [5, 10) < [10, 15) < [15, 20) < [20, 25) < [25, 30)]

print (df2.stack())
1  1    13
   2    19
   3    45
   4    56
2  1    45
   2    54
   3    28
   4    31
3  1    33
   2    45
   3    32
   4     9
4  1    23
   2    65
   3    14
   4    15
dtype: int64

Upvotes: 1

Related Questions