CRSPLK
CRSPLK

Reputation: 95

Power Query Skewed data

I have a problem in power query where my data is coming from a report that is split into pages and some of the pages skew the data to different columns. I think there may be an error based solution, but I would like it to be more redundant and not rely on text vs. number error correction. Mainly because sometimes the data that could be alphabetic in some instances, can be numeric in others. I've prepared a data set that has randomly generated replacements for names and codes. I also had to butcher the data a little to give examples of the different shifts, and to account for records split from different pages.

https://drive.google.com/file/d/0B2qUbAWJXgfyNlByV2RHODJzQjA/view?usp=sharing

There are 12 records in the data set that will eventually contain one row per record. 1st page is the Raw data stripped from the source document. These are Check History records (masked) that need to be moved to a single row per record with separate columns for four specific areas:

[Names, Dates, Check numbers, etc][Earnings][Deductions][Taxes]

Record Info including Names, Dates, Record ID Numbers, and amounts is the fist thing extracted and formatted from the raw data. The steps I applied in NameData and CheckData will show how those records are extracted and formatted, also some of the skewed data in this section was easy to reconcile with merge functions and conditional columns.

Each individual Pay Item (An earning code, Deduction Code, Or Tax Code) is formatted then pivoted to it's own column. You can see an example of this maneuver in the Earnings Query. The PayItemReference query is some basic filters I use as a starting point to My Pay Items. You can see in that Query that the codes will shift from column to column, with Text and Numbers mixed. There can be spaces between the codes and their values, or there can be no space, it can also shift columns completely.

I am working on consolidating codes and their values to regular columns, then I can merge, unpivot, pivot etc to get to the final formatting. I have tried using conditional columns and errors, but there are always small issues with either on the original data set. I just need some fresh eyes and new approaches to the data.

Upvotes: 1

Views: 339

Answers (1)

Eugene
Eugene

Reputation: 1264

This was a challenging task.

First it is good idea to split table back into pages, since column structure for each page is probably unique. Thus I form list of tables, each table for one page. Then I have to process each page: extract column names, add summary information for each row, filter not needed rows, and set column names. This is done for each table in the list by using custom function ConvertTable. Afterwards you just combine resulting tables.

Here:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    AddRowNum = Table.AddColumn(Table.AddIndexColumn(Source, "Index", 1, 1), "RowNum", each Number.Mod([Index]-1, 52)+1, type number),
    CountTables = {1..(Number.RoundUp(Table.RowCount(AddRowNum)/52, 0))},
    ListTables = List.Transform(CountTables, (ListItem)=>Table.SelectRows(AddRowNum, each [Index] > 52 * (ListItem - 1) and [Index] <= 52 * ListItem)),

    ConvertTable = (tbl as table) as table =>
    let
        hdr1 = Table.Transpose(Table.FillDown(Table.Transpose(Table.FromRecords({tbl{6}})), {"Column1"})),
        hdr2 = Table.FromRecords({tbl{7}}),
        ColNames = Table.Transpose(Table.SelectColumns(Table.FirstN(Table.AddColumn(Table.Transpose(Table.Combine({hdr1, hdr2})), "ColumnName", each [Column1] & ": " & [Column2]), 19), {"ColumnName"})),
        AddPayDate = Table.AddColumn(tbl, "Pay Date", each if [RowNum] > 8 and Text.Trim(tbl{[RowNum]-2}[Column9]) = "Pay Date" then [Column9] else null, type date),
        AddPeriodEndDate = Table.AddColumn(AddPayDate, "Period End Date", each if [RowNum] > 8 and Text.Trim(tbl{[RowNum]-2}[Column12]) = "Period End Date" then [Column12] else null, type date),
        AddJobCode = Table.AddColumn(AddPeriodEndDate, "Job Code", each if [RowNum] > 8 and Text.Trim(tbl{[RowNum]-2}[Column14]) = "Job Code" then [Column14] else null, Int64.Type),
        AddCheckInfo = Table.AddColumn(AddJobCode, "Check Info", each if [RowNum] > 8 and Text.Trim([Column1]) = "Check Printed:" then Table.Transpose(Table.SelectRows(Table.Transpose(Table.FromRecords({_})), each [Column1] <> null)) else null),
        ExpandedCheckInfo = Table.ExpandTableColumn(AddCheckInfo, "Check Info", {"Column4", "Column6", "Column8"}, {"Check Amount", "Direct Deposit", "Net"}),
        FillUp = Table.FillUp(ExpandedCheckInfo, {"Column3", "Check Amount", "Direct Deposit", "Net"})//Table.AddColumn(AddJobCode, "tmp2", each if [RowNum] < 9 then "" else (if Text.Trim([Column1]) = "Check Printed:" then (if [Column3] = null then -1 else [Column3]) else null), type text), {"tmp2"}),
        FillDown = Table.FillDown(FillUp, {"Column1", "Column5", "Pay Date", "Period End Date", "Job Code"}),
        AddCheckEEIDfixed = Table.AddColumn(FillDown, "Check:EEID.fixed", each Text.From([Column5]) & ":" & Text.From([Column3]), type text),
        FilteredExtraRows = Table.SelectRows(AddCheckEEIDfixed, each [RowNum] > 8 and Text.Trim([Column1]) <> "Check Printed:" and Text.Trim([Column7]) <> "PerControl" and Text.Trim(tbl{[RowNum]-2}[Column7]) <> "PerControl" and [#"Check:EEID.fixed"] <> null),
        DemotedHeaders = Table.DemoteHeaders(FilteredExtraRows),
        GetColumnNames1 = Table.Combine({Table.FromRecords({DemotedHeaders{0}}), ColNames}),
        GetColumnNames2 = Table.PromoteHeaders(Table.FillDown(GetColumnNames1, Table.ColumnNames(GetColumnNames1))),
        SetColumnNames = Table.PromoteHeaders(Table.Combine({GetColumnNames2, FilteredExtraRows}))
    in
        SetColumnNames,

    ConvertedList = List.Transform(ListTables, (t) => ConvertTable(t)),
    GetWholeTable = Table.Combine(ConvertedList)
in
    GetWholeTable

Upvotes: 2

Related Questions