Rock
Rock

Reputation: 107

How to extract and organize multiple CSV file column values into a single table based on unique identifiers using Excel Power Query?

I am using Excel power query to combine and process data from multiple csv files located in a folder. All csv files have the same column structure (shown below) but a different number of rows or records.

Year Period Week District Route SaleAmt
2024 10 1 123 ABC 1000

Here is the link to the sample data files: sample data files

Once data from all csv files is combined into one table, I create a new column "PK" by concatenating the values in fields "Period", "Week", "District" and "Route".

Next, to get the list of unique "PK" values in the combined data, I group by the "PK" column and aggregate the "SaleAmt" column using the operation "All Rows". This will create a new column containing tables of all rows grouped by the unique "PK" values.

Finally, I need to generate a table structure as follows, which maps the "SaleAmt" value from each csv file to the corresponding "PK" value:

PK Year Period Week District Route File-1 SaleAmt File-2 SaleAmt File-3 SaleAmt
101123ABC 2024 10 1 123 ABC 1000 500 800
11469HBG 2024 11 4 69 HBG null null 999

Note:

  1. Certain values in the "PK" column may not be present in some data files. In that case, the "SaleAmt" column corresponding to the data file should show as "null" or "not found" in the final table.

Here is what I have done so far: (M-code below)

let
    Source = Folder.Files("Folder containing csv files"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Added Custom" = Table.AddColumn(#"Expanded Table Column1", "PK", each Text.Combine({[Period],[Week],[District],[Route]},"")),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"PK"}, {{"SaleAmt", each _, type table [Source.Name=text, Year=text, Period=text, Week=text, District=text, Route=text, SaleAmt=text, PK=text]}})
in
    #"Grouped Rows"

How do I get to the final table structure after creating a unique list of "PK" values? Any help / pointers will be appreciated.

Upvotes: 1

Views: 128

Answers (1)

Sam Nseir
Sam Nseir

Reputation: 12061

Expand then do a pivot.

let
    Source = Folder.Files("Folder containing csv files"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Added Custom" = Table.AddColumn(#"Expanded Table Column1", "PK", each Text.Combine({[Period],[Week],[District],[Route]},"")),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"PK"}, {{"SaleAmt", each _, type table [Source.Name=text, Year=text, Period=text, Week=text, District=text, Route=text, SaleAmt=text, PK=text]}}),
    #"Expanded SaleAmt" = Table.ExpandTableColumn(#"Grouped Rows", "SaleAmt", {"Year", "Period", "Week", "District", "Route", "Source.Name", "SaleAmt"}, {"Year", "Period", "Week", "District", "Route", "File", "SaleAmt"}),
    #"Pivoted Column" = Table.Pivot(#"Expanded SaleAmt", List.Distinct(#"Expanded SaleAmt"[File]), "File", "SaleAmt")
in
    #"Pivoted Column"

Upvotes: 1

Related Questions