Geppelt
Geppelt

Reputation: 363

Determining if date is between start/end dates, and averaging days since start

I have a long list of processes that all have start and end dates. I would like to be able to find how many processes, that have run for more than a day, are running on a given date, and the average time since start. In the end I want to make a plot showing the average process time over time. What I am having trouble with is not identifying which processes are running on a give date, but averaging the time since start for all processes that running on a given date.

Example Date ('Raw' sheet): ![enter image description here

Desired Results ('Data' sheet): ![enter image description here

To get the number of processes running on a given day, I used the following:

=COUNTIFS(raw!B:B, "<" & data!A2, raw!C:C, ">" & data!A2) + COUNTIFS(raw!B:B, "<" & data!A2, raw!C:C, "=" & "")

Long description of above tables.: If I have a process that started on 1-1-20, and only that process is running on 1-1-20, then the 'Average Days Since Process Start' is 0 as it has been running less then a day. On 2-1-20, if still that is the only process running, then the 'Average Days Since Process Start' is 1 since that single process has been running for 1 day.

If I start a new process on 4-1-20, though there are two processes running yet the process with less than 1 day of operation is not be counted. So the average will be 3/1=3. Then on 5-1-20 the 'Average Days Since Process Start' is 2.5 as the first process has been running 4 days, and the second 1 day. (1+4)/2 = 2.5

If I start yet another process on 10-1-20, but stopped the first process on 9-1-20, then on 11-1-20 the 'Average Days Since Process Start' is 4, as the second process has been running for 7 days, and the third process for 1. (1+7)/2 = 4

Upvotes: 3

Views: 228

Answers (3)

Ferdinando
Ferdinando

Reputation: 964

I have found this solution:

The layout is little different from yours, but you can after create your own layout.

  1. In ROWS 9 there are the dates.
  2. In Column A there are the processes
  3. The row of single process there are the number of the days that the process is running.
  4. Row(15) Days there are the days to start from zero.
  5. Row(16)Processes active there are the numbers of processes active in that day. In my example the formula, in cells B16, is COUNTIF(B10:B14,">" & 0).
  6. Row(17) AVERAGE: there are the everages. The formula is, in cells(B17), IF.ERROR(B15/SUMM(B10:B14),0).

enter image description here

Upvotes: 0

Tom Sharpe
Tom Sharpe

Reputation: 34180

I can only get the same answers as OP if I use a formula like this to count the number of active processes at a particular date:

=COUNTIFS(B$2:B$10,"<"&F2,C$2:C$10,">="&F2)

This gives one less than the number of processes shown in the question because a process is not included if it starts on the current date. This seems inconsistent with the description given in the 'Clarification' section of the question.

I can then sum up the number of days like this:

=SUMPRODUCT((F2-B$2:B$10)*(B$2:B$10<=F2)*(C$2:C$10>=F2))

and divide one by the other to get the average days per process. Note the <= and >= in the above formula.

    =IFERROR(H2/G2,0)

enter image description here

Upvotes: 3

AnilGoyal
AnilGoyal

Reputation: 26218

My proposed solution in this case is -

Data used-

enter image description here

Step-1 select raw table then Data (Menu) > Get & Transform (from table)

Step-2 change the data-type of dates column to number

Step-3 Create a custom column as

enter image description here

calculation used-

={[Start]..[End]}

Step-4 Expand dates by right clicking the column, you'll get an output like this

enter image description here

Step[-5 change data-type of this new column to dates

Step-6 Delete two start and end columns (these aren't needed now)

Step-7 Group-by Process column by right clicking it.. (output will be like the following screenshot)

enter image description here

Step-8 add another custom column with following calculation

enter image description here

Table.AddIndexColumn([Count],"Index",0)

Step-9 remove other columns by right clicking new column

Step-10 expand this column to get an output like this

enter image description here

Step-11 remove process column

Step-12 Group-by again with following settings

enter image description here

You'll get desired output like this

enter image description here

All query steps can be re-checked with this screenshot

enter image description here

Good Luck. Formatted final output

enter image description here

PS I would've solved first part in powerquery also. If you need the steps do tell me.

Upvotes: 0

Related Questions