Piazza
Piazza

Reputation: 65

Group rows based on contiguous column data

I have a dataset table in PowerBi with this structure:

Dataset Table Structure

It's pretty straight forward; the column "Duration (s)" is the difference in seconds between "DateTime End" and "DateTime Start.

I need to create a visualization (a table/matrix) which would give me this result:

Table Visualization

Basically it's a sort of group by of columns "Product" and "Machine" based on contiguous values of the column "Event Code" where i keep the first row "DateTime Start" and the last row "DateTime End", sum the rows "Duration (s)" values (or calculate it ex-novo, since it's always a time diff) and count the contiguous rows with the same "Event Code" in the column "Rows Count".

Since the dataset is fairly large (5m+ rows) i'd prefer a solution based on Dax rather than an M-based one, but any will suffice.

Upvotes: 2

Views: 712

Answers (2)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60494

With M code, starting with your initial table, where I presume you've added a custom column for Duration (s):

  • Group by the Product, Machine and Event code columns
  • Add your various aggregations for
    • min start
    • max end
    • sum duration
    • row count

Then, open the Advanced Editor and add the optional final argument for the Table.Group function: GroupKind.Local This is what that argument is for

let

//orginal table
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZLBCsMgDIZfZeRcqInald1s77t0t+Jp9AXKxl5/rqcm0QVR+CAfJr+uKyzPfftABw8sRyqb+rIc0WW4oRMcIXeGEjmT08qkSjijraDgYCuBs680llQJZ6+VWZUwDuIW0oqT41eUZCQWbUW+y2gro5iNDuX+fv0EryNufJc/gmgKr1KYVUEtq7bQSOokWDkNkPMX", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product = _t, Machine = _t, #"Event Code" = _t, #"DateTime Start" = _t, #"DateTime End" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"Product", type text}, {"Machine", type text}, {"Event Code", type text}, 
        {"DateTime Start", type datetime}, {"DateTime End", type datetime}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Duration (s)", each Duration.TotalSeconds([DateTime End]-[DateTime Start])),

//added code to group by contiguous event code groups
    #"Grouped Rows" = Table.Group(#"Added Custom", {"Product", "Machine", "Event Code"}, {
        {"DateTime Start", each List.Min([DateTime Start]), type nullable datetime}, 
        {"DateTime End", each List.Max([DateTime End]), type nullable datetime}, 
        {"Duration (s)", each List.Sum([#"Duration (s)"]), type number}, 
        {"Rows Count", each Table.RowCount(_), Int64.Type}},
        GroupKind.Local)
    //    ^^^^^
in
    #"Grouped Rows"

enter image description here

enter image description here

Upvotes: 3

horseyride
horseyride

Reputation: 21428

This is a M version. No idea how it performs with that many records. I stuck in a Table.Buffer at one point where it seems to make sense

Basically, it creates a combined column of Product&Machine&Event, then offsets that one row downward. By comparing the offset to the current row value, we find where the breaks are and place an index. Fill down the index and you have something you can group on.

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Combo", each [Product]&[Machine]&[Event Code]),
#"Changed Type" = Table.Buffer(Table.TransformColumnTypes(#"Added Custom",{{"Product", type text}, {"Machine", type text}, {"Event Code", type text}, {"DateTime Start", type datetime}, {"DateTime End", type datetime}, {"Duration (s)", Int64.Type}})),

// pull combined Product&Machine&Event cell from prior row
shiftedList = {null} & List.RemoveLastN(#"Changed Type"[Combo],1),
custom1 = Table.ToColumns(#"Changed Type") & {shiftedList},
custom2 = Table.FromColumns(custom1,Table.ColumnNames(#"Changed Type") & {"Previous Row"}),

#"Added Custom1" = Table.AddColumn(custom2, "Custom", each if [Previous Row]=null then [Index] else if [Combo]=[Previous Row] then null else [Index]),
#"Filled Down" = Table.FillDown(#"Added Custom1",{"Custom"}),
#"Grouped Rows" = Table.Group(#"Filled Down", {"Custom"}, {
    {"Product", each [Product]{0}   },
    {"Machine", each [Machine]{0}},
    {"Event Code", each [Event Code]{0}},
    {"DateTime Start", each List.Min([DateTime Start]), type datetime},
    {"DateTime End", each List.Max([DateTime End]), type datetime},
    {"Duration (s)", each List.Sum([#"Duration (s)"]), type nullable number},
    {"Rows Count", each Table.RowCount(_), Int64.Type}
}),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Custom"})
in #"Removed Columns"

enter image description here

Might be faster if you don't include the Duration column in any of this then calculate it separately in a final step

Upvotes: 3

Related Questions