connor449
connor449

Reputation: 1679

How to get total of groupby cumsum row by row

I have a df that looks like this:

519   962.966667   91.525424  out_of_range       0   55.932203
520   970.666667   91.525424  out_of_range       1   91.525424
521   971.766667   81.355932  out_of_range       2   91.525424
522   972.900000   76.271186  out_of_range       3   81.355932
523   974.000000   76.271186  out_of_range       4   76.271186
524   975.100000   76.271186  out_of_range       5   76.271186
525   975.833333   76.271186  out_of_range       6   76.271186
526   977.066667   76.271186  out_of_range       7   76.271186
527   977.933333   76.271186  out_of_range       8   76.271186
528   978.833333   76.271186  out_of_range       9   76.271186
529   980.066667   55.932203      in_range       0   76.271186
530   981.200000   55.932203      in_range       1   55.932203
531   985.933333   66.101695      in_range       2   55.932203
532   987.566667   66.101695      in_range       3   66.101695
533   989.033333   55.932203      in_range       4   66.101695
534   991.000000  111.864407  out_of_range       0   55.932203
535  1004.900000  111.864407  out_of_range       1  111.864407
536  1006.033333  111.864407  out_of_range       2  111.864407
537  1007.166667   66.101695      in_range       0  111.864407
538  1008.300000   66.101695      in_range       1   66.101695

df[3] indicates where a certain value is in or out a set range. df[4] indicates the cumulative count for each in_range or out_out_range group.

How do I create a column that applies the size of each in_range out_of_range group to the entire group, row by row, like this (last column):

519   962.966667   91.525424  out_of_range       0   55.932203   9
520   970.666667   91.525424  out_of_range       1   91.525424   9
521   971.766667   81.355932  out_of_range       2   91.525424   9
522   972.900000   76.271186  out_of_range       3   81.355932   9
523   974.000000   76.271186  out_of_range       4   76.271186   9
524   975.100000   76.271186  out_of_range       5   76.271186   9
525   975.833333   76.271186  out_of_range       6   76.271186   9
526   977.066667   76.271186  out_of_range       7   76.271186   9
527   977.933333   76.271186  out_of_range       8   76.271186   9
528   978.833333   76.271186  out_of_range       9   76.271186   9
529   980.066667   55.932203      in_range       0   76.271186   4
530   981.200000   55.932203      in_range       1   55.932203   4
531   985.933333   66.101695      in_range       2   55.932203   4
532   987.566667   66.101695      in_range       3   66.101695   4
533   989.033333   55.932203      in_range       4   66.101695   4
534   991.000000  111.864407  out_of_range       0   55.932203   2 
535  1004.900000  111.864407  out_of_range       1  111.864407   2
536  1006.033333  111.864407  out_of_range       2  111.864407   2
537  1007.166667   66.101695      in_range       0  111.864407   1
538  1008.300000   66.101695      in_range       1   66.101695   1

Upvotes: 1

Views: 26

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

I'm not sure how you get the cumcount originally. You could have change groupby().cumcount() to groupby().size() to get the desired numbers.

That said, with the current dataframe, you can use cumsum() to identify the blocks and groupby().transform():

df['cumcount'] = df[4].groupby(df[4].eq(0).cumsum()).transform('max')

Output:

      0            1           2             3  4           5  cumcount
0   519   962.966667   91.525424  out_of_range  0   55.932203         9
1   520   970.666667   91.525424  out_of_range  1   91.525424         9
2   521   971.766667   81.355932  out_of_range  2   91.525424         9
3   522   972.900000   76.271186  out_of_range  3   81.355932         9
4   523   974.000000   76.271186  out_of_range  4   76.271186         9
5   524   975.100000   76.271186  out_of_range  5   76.271186         9
6   525   975.833333   76.271186  out_of_range  6   76.271186         9
7   526   977.066667   76.271186  out_of_range  7   76.271186         9
8   527   977.933333   76.271186  out_of_range  8   76.271186         9
9   528   978.833333   76.271186  out_of_range  9   76.271186         9
10  529   980.066667   55.932203      in_range  0   76.271186         4
11  530   981.200000   55.932203      in_range  1   55.932203         4
12  531   985.933333   66.101695      in_range  2   55.932203         4
13  532   987.566667   66.101695      in_range  3   66.101695         4
14  533   989.033333   55.932203      in_range  4   66.101695         4
15  534   991.000000  111.864407  out_of_range  0   55.932203         2
16  535  1004.900000  111.864407  out_of_range  1  111.864407         2
17  536  1006.033333  111.864407  out_of_range  2  111.864407         2
18  537  1007.166667   66.101695      in_range  0  111.864407         1
19  538  1008.300000   66.101695      in_range  1   66.101695         1

Upvotes: 1

Related Questions