Ziv
Ziv

Reputation: 2785

How to split a google sheet range into parts by sum

I have a three column sheet: Name (ID), Length (Timespan), Category (Arbitrary, could be sequence). I would like to automatically fill the Category column with values such that each category will have the same sum of the Length column. Currently I am splitting the category column evenly using the formula =TRANSPOSE(SPLIT(JOIN(",", ARRAYFORMULA(REPT(G2:G9&",", H2))), ",")) which I have copied from this site. Since the Length column varies by a lot I get categories that are of much different sizes.

This is example data from my sheet:

Chapter       Length    Due Date (Category)
Chapter 2     00:23:43  07/06/2020
Chapter 3     00:19:01  07/06/2020
Chapter 4     00:13:29  07/06/2020
Chapter 5     00:13:00  07/06/2020
Chapter 6     00:07:56  07/06/2020
Chapter 7     00:12:38  08/06/2020
Chapter 8     00:15:20  08/06/2020
Chapter 9     00:23:51  08/06/2020
Chapter 10    00:29:40  08/06/2020
Chapter 11    00:23:37  08/06/2020
Chapter 12    00:15:39  09/06/2020
Chapter 13    00:27:07  09/06/2020
Chapter 14    00:09:18  09/06/2020
Chapter 15    00:21:52  09/06/2020
Chapter 16    00:31:35  09/06/2020
Chapter 17    00:21:17  10/06/2020
Chapter 18    00:57:07  10/06/2020
Chapter 19    00:24:42  10/06/2020
Chapter 20    00:20:24  10/06/2020
Chapter 21    00:32:28  10/06/2020
Chapter 22    00:35:17  11/06/2020
Chapter 23    00:25:54  11/06/2020
Chapter 24    00:26:35  11/06/2020
Chapter 25    00:21:25  11/06/2020
Chapter 26    00:37:04  11/06/2020
Chapter 27    00:24:27  12/06/2020
Chapter 28    00:05:15  12/06/2020
Chapter 29    00:07:29  12/06/2020
Chapter 30    00:41:52  12/06/2020
Chapter 31    00:43:30  12/06/2020
Chapter 32    00:34:31  13/06/2020
Chapter 33    00:45:24  13/06/2020
Chapter 34    00:20:02  13/06/2020
Chapter 35    00:14:43  13/06/2020
Chapter 36    00:23:56  13/06/2020

And this is the result of the query =QUERY(A2:D56,"select D, count(D),sum(C) where D is not null group by D") which groups by the category (Due Date) and prints the sum of times:

Due Date     sum 
07/06/2020   1:17:09
08/06/2020   1:45:06
09/06/2020   1:45:31
10/06/2020   2:35:58
11/06/2020   2:26:15
12/06/2020   2:02:33
13/06/2020   2:18:36

I would like for this table to have more equal sums as much as possible like this:

Due Date     sum 
07/06/2020   ~2:01:35
08/06/2020   ~2:01:35
09/06/2020   ~2:01:35
10/06/2020   ~2:01:35
11/06/2020   ~2:01:35
12/06/2020   ~2:01:35
13/06/2020   ~2:01:35

Upvotes: 1

Views: 1522

Answers (1)

Tom Sharpe
Tom Sharpe

Reputation: 34230

You may be able to improve on this, but as a first cut I would just divide the cumulative elapsed time by the average time per day (about 2 hours, as you found) and use that to look up the corresponding date:

=ArrayFormula(to_date(vlookup(sumif(row(B2:B36),"<="&row(B2:B36),B2:B36)/(sum(B2:B36)/countunique(C2:C36)),
{sequence(countunique(C2:C36),1,0),unique(C2:C36)},2)))

I have put the cumulative time elapsed next to the date so you can see how well (or badly) the time divides equally per day:

enter image description here

The UK-style dates have slightly messed up my formatting, which is ironic since I am in the UK but my sheet defaults to US :-(

EDIT

I think you can improve the fit by adding half the average time per chapter to the lookup:

=ArrayFormula(to_date(vlookup((sumif(row(B2:B36),"<="&row(B2:B36),B2:B36)+average(B2:B36)/2)/(sum(B2:B36)/countunique(C2:C36)),
{sequence(countunique(C2:C36),1,0),unique(C2:C36)},2)))

You can use full-column ranges if you want to:

=ArrayFormula(filter(to_date(vlookup((sumif(row(B2:B),"<="&row(B2:B),B2:B)+average(B2:B)/2)/(sum(B2:B)/countunique(C2:C)),
{sequence(countunique(C2:C),1,0),unique(filter(C2:C,C2:C<>""))},2)),A2:A<>""))

Upvotes: 2

Related Questions