Reputation: 363
I have a long list of processes that all have start and end dates. I would like to be able to find how many processes, that have run for more than a day, are running on a given date, and the average time since start. In the end I want to make a plot showing the average process time over time. What I am having trouble with is not identifying which processes are running on a give date, but averaging the time since start for all processes that running on a given date.
Desired Results ('Data' sheet):
To get the number of processes running on a given day, I used the following:
=COUNTIFS(raw!B:B, "<" & data!A2, raw!C:C, ">" & data!A2) + COUNTIFS(raw!B:B, "<" & data!A2, raw!C:C, "=" & "")
Long description of above tables.: If I have a process that started on 1-1-20, and only that process is running on 1-1-20, then the 'Average Days Since Process Start' is 0 as it has been running less then a day. On 2-1-20, if still that is the only process running, then the 'Average Days Since Process Start' is 1 since that single process has been running for 1 day.
If I start a new process on 4-1-20, though there are two processes running yet the process with less than 1 day of operation is not be counted. So the average will be 3/1=3. Then on 5-1-20 the 'Average Days Since Process Start' is 2.5 as the first process has been running 4 days, and the second 1 day. (1+4)/2 = 2.5
If I start yet another process on 10-1-20, but stopped the first process on 9-1-20, then on 11-1-20 the 'Average Days Since Process Start' is 4, as the second process has been running for 7 days, and the third process for 1. (1+7)/2 = 4
Upvotes: 3
Views: 228
Reputation: 964
I have found this solution:
The layout is little different from yours, but you can after create your own layout.
ROWS 9
there are the dates.Column A
there are the processesRow(15) Days
there are the days to start from zero.Row(16)Processes
active there are the numbers of processes active in that day. In my
example the formula, in cells B16, is COUNTIF(B10:B14,">" & 0)
.IF.ERROR(B15/SUMM(B10:B14),0)
.Upvotes: 0
Reputation: 34180
I can only get the same answers as OP if I use a formula like this to count the number of active processes at a particular date:
=COUNTIFS(B$2:B$10,"<"&F2,C$2:C$10,">="&F2)
This gives one less than the number of processes shown in the question because a process is not included if it starts on the current date. This seems inconsistent with the description given in the 'Clarification' section of the question.
I can then sum up the number of days like this:
=SUMPRODUCT((F2-B$2:B$10)*(B$2:B$10<=F2)*(C$2:C$10>=F2))
and divide one by the other to get the average days per process. Note the <= and >= in the above formula.
=IFERROR(H2/G2,0)
Upvotes: 3
Reputation: 26218
My proposed solution in this case is -
Data used-
Step-1 select raw
table then Data (Menu) > Get & Transform (from table)
Step-2 change the data-type of dates column to number
Step-3 Create a custom column as
calculation used-
={[Start]..[End]}
Step-4 Expand dates by right clicking the column, you'll get an output like this
Step[-5 change data-type of this new column to dates
Step-6 Delete two start and end columns (these aren't needed now)
Step-7 Group-by Process column by right clicking it.. (output will be like the following screenshot)
Step-8 add another custom column with following calculation
Table.AddIndexColumn([Count],"Index",0)
Step-9 remove other columns by right clicking new column
Step-10 expand this column to get an output like this
Step-11 remove process column
Step-12 Group-by again with following settings
You'll get desired output like this
All query steps can be re-checked with this screenshot
Good Luck. Formatted final output
PS I would've solved first part in powerquery also. If you need the steps do tell me.
Upvotes: 0