Reputation: 1679
I have a df that looks like this:
519 962.966667 91.525424 out_of_range 0 55.932203
520 970.666667 91.525424 out_of_range 1 91.525424
521 971.766667 81.355932 out_of_range 2 91.525424
522 972.900000 76.271186 out_of_range 3 81.355932
523 974.000000 76.271186 out_of_range 4 76.271186
524 975.100000 76.271186 out_of_range 5 76.271186
525 975.833333 76.271186 out_of_range 6 76.271186
526 977.066667 76.271186 out_of_range 7 76.271186
527 977.933333 76.271186 out_of_range 8 76.271186
528 978.833333 76.271186 out_of_range 9 76.271186
529 980.066667 55.932203 in_range 0 76.271186
530 981.200000 55.932203 in_range 1 55.932203
531 985.933333 66.101695 in_range 2 55.932203
532 987.566667 66.101695 in_range 3 66.101695
533 989.033333 55.932203 in_range 4 66.101695
534 991.000000 111.864407 out_of_range 0 55.932203
535 1004.900000 111.864407 out_of_range 1 111.864407
536 1006.033333 111.864407 out_of_range 2 111.864407
537 1007.166667 66.101695 in_range 0 111.864407
538 1008.300000 66.101695 in_range 1 66.101695
df[3] indicates where a certain value is in or out a set range. df[4] indicates the cumulative count for each in_range or out_out_range group.
How do I create a column that applies the size of each in_range out_of_range group to the entire group, row by row, like this (last column):
519 962.966667 91.525424 out_of_range 0 55.932203 9
520 970.666667 91.525424 out_of_range 1 91.525424 9
521 971.766667 81.355932 out_of_range 2 91.525424 9
522 972.900000 76.271186 out_of_range 3 81.355932 9
523 974.000000 76.271186 out_of_range 4 76.271186 9
524 975.100000 76.271186 out_of_range 5 76.271186 9
525 975.833333 76.271186 out_of_range 6 76.271186 9
526 977.066667 76.271186 out_of_range 7 76.271186 9
527 977.933333 76.271186 out_of_range 8 76.271186 9
528 978.833333 76.271186 out_of_range 9 76.271186 9
529 980.066667 55.932203 in_range 0 76.271186 4
530 981.200000 55.932203 in_range 1 55.932203 4
531 985.933333 66.101695 in_range 2 55.932203 4
532 987.566667 66.101695 in_range 3 66.101695 4
533 989.033333 55.932203 in_range 4 66.101695 4
534 991.000000 111.864407 out_of_range 0 55.932203 2
535 1004.900000 111.864407 out_of_range 1 111.864407 2
536 1006.033333 111.864407 out_of_range 2 111.864407 2
537 1007.166667 66.101695 in_range 0 111.864407 1
538 1008.300000 66.101695 in_range 1 66.101695 1
Upvotes: 1
Views: 26
Reputation: 150745
I'm not sure how you get the cumcount
originally. You could have change groupby().cumcount()
to groupby().size()
to get the desired numbers.
That said, with the current dataframe, you can use cumsum()
to identify the blocks and groupby().transform()
:
df['cumcount'] = df[4].groupby(df[4].eq(0).cumsum()).transform('max')
Output:
0 1 2 3 4 5 cumcount
0 519 962.966667 91.525424 out_of_range 0 55.932203 9
1 520 970.666667 91.525424 out_of_range 1 91.525424 9
2 521 971.766667 81.355932 out_of_range 2 91.525424 9
3 522 972.900000 76.271186 out_of_range 3 81.355932 9
4 523 974.000000 76.271186 out_of_range 4 76.271186 9
5 524 975.100000 76.271186 out_of_range 5 76.271186 9
6 525 975.833333 76.271186 out_of_range 6 76.271186 9
7 526 977.066667 76.271186 out_of_range 7 76.271186 9
8 527 977.933333 76.271186 out_of_range 8 76.271186 9
9 528 978.833333 76.271186 out_of_range 9 76.271186 9
10 529 980.066667 55.932203 in_range 0 76.271186 4
11 530 981.200000 55.932203 in_range 1 55.932203 4
12 531 985.933333 66.101695 in_range 2 55.932203 4
13 532 987.566667 66.101695 in_range 3 66.101695 4
14 533 989.033333 55.932203 in_range 4 66.101695 4
15 534 991.000000 111.864407 out_of_range 0 55.932203 2
16 535 1004.900000 111.864407 out_of_range 1 111.864407 2
17 536 1006.033333 111.864407 out_of_range 2 111.864407 2
18 537 1007.166667 66.101695 in_range 0 111.864407 1
19 538 1008.300000 66.101695 in_range 1 66.101695 1
Upvotes: 1