David T
David T

Reputation: 2143

MS Power Query - eliminating duplicates in a set of delimited strings

I have Excel data which looks something like this:

Sources   Targets   Routes

Lemons    Chair     A,D
Lemons    Chair     D,F
Oranges   Chair     B,F,G
Oranges   Chair     B,C
Oranges   Door      A,G
Oranges   Door      B,C

I am trying to use Power Query to condense it to this:

Sources   Targets   Routes

Lemons    Chair     A,D,F
Oranges   Chair     B,C,F,G
Oranges   Door      A,B,C,G

That is, for each Source/Target pair, I need to

There is a max of 3 routes in Routes source data. I'm pretty sure I need to split the Routes column into 3 columns, then use Group. But there I get stuck.

Suggestions?

Upvotes: 2

Views: 1113

Answers (1)

MarcelBeug
MarcelBeug

Reputation: 2997

The code below implements the steps you outlined plus a sort on Routes. It doesn't use split columns, but Text.Split.

The SplittedRoutes step was created using some text transform function on the Transform tab, then adjusted to use Text.Split.

Likewise, the GroupedRows step was created with Group By on the Transform tab, using operation All Rows, then adjusted to the code below.

let
    Source = ExcelData,
    SplittedRoutes = Table.TransformColumns(Source,{{"Routes", each Text.Split(_,",")}}),
    ExpandedRoutes = Table.ExpandListColumn(SplittedRoutes, "Routes"),
    RemovedDuplicates = Table.Distinct(ExpandedRoutes, {"Sources", "Targets", "Routes"}),
    GroupedRows = Table.Group(RemovedDuplicates, {"Sources", "Targets"}, {{"Routes", each Text.Combine(List.Sort(_[Routes]),","), type text}})
in
    GroupedRows

Upvotes: 4

Related Questions