grxthy
grxthy

Reputation: 55

Writing a query that can have excel fill in and consolidate unpopulated fields within a group of duplicates?

I have a contact info dataset (large) that contains lots of semi-duplicate rows that I'd like to condense into as little rows as possible. Attached is a sample of what I'm talking about.

data sample

The blue table on the left is smaller scale example of what I'm currently working with. The orange table on the right is what I would like the table to look like.

I want to write a query that will be able to select an ID that has multiple rows, and within that selection, assess whether values can be moved into a parent row that has unpopulated cells (see ID "4" and how I condensed those three rows of data into one by filling in blanks and consolidating duplicates).

An important point of emphasis is how to perform this task without it being a blanket statement for all duplicates in the entire worksheet. Ultimately I want to perform this task for the entire worksheet, but I want excel to first isolate a single ID and then execute the aforementioned task, rather than evaluating the criteria based on all duplicate IDs. ((If that makes sense))

One other condition I would like to have is for certain columns where multiple rows under the same ID have different values, is to allocate that data into a subsequent column (see Tags & Tags2 columns under ID "1") instead of overriding a cell.

This sounds like a task for Power Query, but my knowledge is limited in that realm. Any help on how to construct a query that accomplish this task is much appreciated. Thanks.

Upvotes: 0

Views: 126

Answers (2)

horseyride
horseyride

Reputation: 21393

This seems to work fine

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type any}, {"City", type text}, {"Tags", type text}}),
// group, then unpivot, remove duplicates
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"Data", each Table.Distinct(Table.UnpivotOtherColumns(_, {"ID"}, "Attribute", "Value"), {"Attribute", "Value"}), type table}}),
// combine all the tags into one cell for later splitting
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Group([Data], {"ID", "Attribute"}, {{"Data", each Text.Combine([Value],","), type text}})),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Attribute", "Data"}, {"Attribute", "Data.1"}),
// replace null with Title to preserve rows with no data
#"Replaced Value" = Table.ReplaceValue(#"Expanded Custom",null,"Title",Replacer.ReplaceValue,{"Attribute"}),
#"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Data"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Data.1"),
// split the Tags column into any number of columns as needed
#"Replaced Value1" = Table.ReplaceValue(#"Pivoted Column",null,"xxx",Replacer.ReplaceValue,{"Tags"}),
DynamicColumnList  = List.Transform({1 ..List.Max(Table.AddColumn(#"Replaced Value1","Custom", each List.Count(Text.PositionOfAny([Tags],{","},Occurrence.All)))[Custom])+1}, each "Tags." & Text.From(_)),
#"Split Column by Delimiter" =  Table.SplitColumn(   #"Pivoted Column", "Tags",  Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), DynamicColumnList)
in  #"Split Column by Delimiter"

Upvotes: 2

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60464

You can get your desired output from Power Query using just the Table.Group function.

I have assumed:

  • Output columns are only as you show
  • Input columns don't have anything in Phone2 and Tags2
    • If that is not the case, simple modifications are possible
  • If there are more distinct entities that columns for the output, they will be output in a single column concatenated.
    • In other words, if you had three tags; the first one would be in the Tags column and the second and third concatenated with a comma in the Tags 2 column.
    • I did it this way because, since you show no examples, I'm not quite sure how you want things lined up if you have, for example, multiple phones and multiple tags.

Note: If you want to restrict this to just one ID, just insert a filtering step at the beginning

M Code

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type text}, {"City", type text}, {"Tags", type text}, {"Tags2", type any}}),

//Group by ID then
//Depending on how many columns available in results table, will 
//either concatenate, multiple non-duplicate rows, or put them in separate columns
    #"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {
        {"Title", each Text.Combine(List.Distinct([Title]),", ")},
        {"Company", each Text.Combine(List.Distinct([Company]),", ")},
        {"Phone", each try List.RemoveNulls([Phone]){0} otherwise null},
        {"Phone 2", each Text.Combine(List.RemoveFirstN(List.RemoveNulls(List.Distinct([Phone])),1),", ")},
        {"City", each Text.Combine(List.Distinct([City]),", ")},
        {"Tags", each try List.RemoveNulls([Tags]){0} otherwise null},
        {"Tags 2", each Text.Combine(List.RemoveFirstN(List.RemoveNulls(List.Distinct([Tags])),1),", ")}      
    })
in
    #"Grouped Rows"

enter image description here

Upvotes: 1

Related Questions