Reputation: 1105
I'm trying to aggregate the below given table 1 to table 2 with Excel power queries.
The goal is to merge continuous time intervals of the same group into a single row. For direct overlappings like event 5 and 6 this was quite easy. But this approach only merged event 1 and 2 and 2 and 3 resulting in two entries (see Table 1b).
The problem is the "transitive" dependence of e.g. event 1 with event 3 via event 2. This dependence can span more than 3 rows. So the transitive closure must be determined.
Programming it could iteratively apply the current solution until no more changes occur. But how to do it in power query?
Table 1 (original):
Event ID | Group | Start | End |
---|---|---|---|
1 | A | 20.01.2022 12:00:00 | 20.01.2022 12:02:00 |
2 | A | 20.01.2022 12:01:00 | 20.01.2022 12:04:20 |
3 | A | 20.01.2022 12:03:10 | 20.01.2022 12:06:00 |
4 | A | 20.01.2022 12:08:00 | 20.01.2022 12:10:00 |
5 | B | 20.01.2022 12:00:50 | 20.01.2022 12:02:00 |
6 | B | 20.01.2022 12:01:00 | 20.01.2022 12:05:00 |
7 | B | 20.01.2022 12:06:00 | 20.01.2022 12:11:00 |
Table 1b (current intermediate solution):
Event ID | Group | Start | End |
---|---|---|---|
1 | A | 20.01.2022 12:00:00 | 20.01.2022 12:04:20 |
2 | A | 20.01.2022 12:01:00 | 20.01.2022 12:06:00 |
4 | A | 20.01.2022 12:08:00 | 20.01.2022 12:10:00 |
5 | B | 20.01.2022 12:00:50 | 20.01.2022 12:05:00 |
7 | B | 20.01.2022 12:06:00 | 20.01.2022 12:11:00 |
Table 2 (desired result):
Event ID | Group | Start | End |
---|---|---|---|
1 | A | 20.01.2022 12:00:00 | 20.01.2022 12:06:00 |
4 | A | 20.01.2022 12:08:00 | 20.01.2022 12:10:00 |
5 | B | 20.01.2022 12:00:50 | 20.01.2022 12:05:00 |
7 | B | 20.01.2022 12:06:00 | 20.01.2022 12:11:00 |
Example which wasn't aggregated fully with the provided solution:
Event ID | Group | Start | End |
---|---|---|---|
1 | A | 20.01.2022 12:02:12 | 20.01.2022 12:05:34 |
2 | A | 20.01.2022 12:02:54 | 20.01.2022 12:05:37 |
3 | A | 20.01.2022 12:05:36 | 20.01.2022 12:05:49 |
4 | A | 20.01.2022 12:05:45 | 20.01.2022 12:07:22 |
5 | A | 20.01.2022 12:06:03 | 20.01.2022 12:06:10 |
results in (previous solution):
Event ID | Group | Start | End |
---|---|---|---|
1 | A | 20.01.2022 12:02:12 | 20.01.2022 12:07:22 |
5 | A | 20.01.2022 12:02:54 | 20.01.2022 12:07:22 |
results in (accepted answer):
Event ID | Group | Start | End |
---|---|---|---|
1 | A | 20.01.2022 12:02:12 | 20.01.2022 12:07:22 |
Upvotes: 1
Views: 607
Reputation: 21298
EDIT See better later answer at Aggregate overlapping time intervals, calculate outage times, improve performance
Step 1: Create separate query, name it process, close and load it before proceeding
(xtable)=>
// for each group, compare each list against all lists in column Custom, and merge those that overlap
let Source= Table.Buffer(xtable),
#"Added Custom"= Table.AddColumn(
Source,
"Custom2",
each let
begin = [Custom]
in
List.Accumulate (
Source[Custom],
begin,
(state,current)=> if List.ContainsAny(state,current) then List.Distinct(List.Combine({current,state})) else state
)
),
// count the number of changes made from original version. If this is not zero, we will recurse the changes
x= List.Sum(List.Transform(List.Positions(#"Added Custom"[Custom]), each if #"Added Custom"[Custom]{_} = #"Added Custom"[Custom2]{_} then 0 else 1)),
RemovePrioCustom= Table.RemoveColumns(#"Added Custom",{"Custom"}),
AddNewCustom= Table.RenameColumns(RemovePrioCustom,{{"Custom2", "Custom"}}),
recursive = if x=0 then AddNewCustom else @process( AddNewCustom)
in recursive
Step 2: Code for table that uses above function:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Start", type datetime}, {"End", type datetime}}),
#"Grouped Rows" = Table.Group(#"Changed Type" , {"Group"}, {{"data", each
let #"AddIndex" = Table.AddIndexColumn(_, "Index", 0, 1, Int64.Type),
#"Round1" = Table.AddColumn(#"AddIndex", "Custom", (i)=>Table.SelectRows(#"AddIndex", each ([Start]>=i[Start] and [Start]<=i[End]) or ([Start]<=i[Start] and [End]>=i[Start]) )[Index]),
#"Round2"= process(#"Round1"),
#"GetStart" = Table.AddColumn(#"Round2", "StartMin", each List.Min(List.Transform([Custom], each #"Round2"[Start]{_})),type datetime),
#"GetEnd"= Table.AddColumn(#"GetStart", "EndMax", each List.Max(List.Transform([Custom], each #"GetStart"[End]{_})), type datetime),
#"RemoveColumns" = Table.RemoveColumns(#"GetEnd",{"Start", "End", "Index", "Custom"}),
#"CleanUp" = Table.Distinct(#"RemoveColumns", {"Group", "StartMin", "EndMax"})
in #"CleanUp", type table }}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Event ID", "StartMin", "EndMax"}, {"Event ID", "StartMin", "EndMax"})
in #"Expanded data"
The function calls itself until no more changes can be made, so should work as deep as needed
Upvotes: 3