Reputation: 55
I have a contact info dataset (large) that contains lots of semi-duplicate rows that I'd like to condense into as little rows as possible. Attached is a sample of what I'm talking about.
The blue table on the left is smaller scale example of what I'm currently working with. The orange table on the right is what I would like the table to look like.
I want to write a query that will be able to select an ID that has multiple rows, and within that selection, assess whether values can be moved into a parent row that has unpopulated cells (see ID "4" and how I condensed those three rows of data into one by filling in blanks and consolidating duplicates).
An important point of emphasis is how to perform this task without it being a blanket statement for all duplicates in the entire worksheet. Ultimately I want to perform this task for the entire worksheet, but I want excel to first isolate a single ID and then execute the aforementioned task, rather than evaluating the criteria based on all duplicate IDs. ((If that makes sense))
One other condition I would like to have is for certain columns where multiple rows under the same ID have different values, is to allocate that data into a subsequent column (see Tags & Tags2 columns under ID "1") instead of overriding a cell.
This sounds like a task for Power Query, but my knowledge is limited in that realm. Any help on how to construct a query that accomplish this task is much appreciated. Thanks.
Upvotes: 0
Views: 126
Reputation: 21393
This seems to work fine
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type any}, {"City", type text}, {"Tags", type text}}),
// group, then unpivot, remove duplicates
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"Data", each Table.Distinct(Table.UnpivotOtherColumns(_, {"ID"}, "Attribute", "Value"), {"Attribute", "Value"}), type table}}),
// combine all the tags into one cell for later splitting
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Group([Data], {"ID", "Attribute"}, {{"Data", each Text.Combine([Value],","), type text}})),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Attribute", "Data"}, {"Attribute", "Data.1"}),
// replace null with Title to preserve rows with no data
#"Replaced Value" = Table.ReplaceValue(#"Expanded Custom",null,"Title",Replacer.ReplaceValue,{"Attribute"}),
#"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Data"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Data.1"),
// split the Tags column into any number of columns as needed
#"Replaced Value1" = Table.ReplaceValue(#"Pivoted Column",null,"xxx",Replacer.ReplaceValue,{"Tags"}),
DynamicColumnList = List.Transform({1 ..List.Max(Table.AddColumn(#"Replaced Value1","Custom", each List.Count(Text.PositionOfAny([Tags],{","},Occurrence.All)))[Custom])+1}, each "Tags." & Text.From(_)),
#"Split Column by Delimiter" = Table.SplitColumn( #"Pivoted Column", "Tags", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), DynamicColumnList)
in #"Split Column by Delimiter"
Upvotes: 2
Reputation: 60464
You can get your desired output from Power Query using just the Table.Group
function.
I have assumed:
Phone2
and Tags2
Tags
column and the second and third concatenated with a comma in the Tags 2
column.Note: If you want to restrict this to just one ID, just insert a filtering step at the beginning
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type text}, {"City", type text}, {"Tags", type text}, {"Tags2", type any}}),
//Group by ID then
//Depending on how many columns available in results table, will
//either concatenate, multiple non-duplicate rows, or put them in separate columns
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {
{"Title", each Text.Combine(List.Distinct([Title]),", ")},
{"Company", each Text.Combine(List.Distinct([Company]),", ")},
{"Phone", each try List.RemoveNulls([Phone]){0} otherwise null},
{"Phone 2", each Text.Combine(List.RemoveFirstN(List.RemoveNulls(List.Distinct([Phone])),1),", ")},
{"City", each Text.Combine(List.Distinct([City]),", ")},
{"Tags", each try List.RemoveNulls([Tags]){0} otherwise null},
{"Tags 2", each Text.Combine(List.RemoveFirstN(List.RemoveNulls(List.Distinct([Tags])),1),", ")}
})
in
#"Grouped Rows"
Upvotes: 1