Reputation: 13
Good day...I've been handed a very messy dataset to clean...my first thought was PQ. Below is a snippet of the 10k+ lines in the actual .txt
file, it simply repeats for each user ('C:'), separated by the ----------
. As a start, the list is the information that, at the moment, I need to extract. When first importing into PQ, it is one column; tab separated.
"Company-Name." Client No: "5780859"
" " Purchase Order No:
Invoice Date: 30-Sep-23 Unique Invoice No: "123456789"
"12345 Main Street"
"City AB"
"T4A 1B7"
"Account Number 1234567"
"-------------------------------------------------------------------------------"
"REPORT - INDIVIDUAL DETAILS"
--------------
"C:" "1234567890"
"Subscriber Name:" "NAME.NAME SPARE"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""
"Handset Transparency"
"Number/Device Information" ""
"Starting Balance" $0.00
"Last Month's Balance" $0.00
"Current Balance" $0.00
"Monthly Credit" $0.00
"Monthly Balance Adjust" $0.00
"CURRENT CHARGES"
"Monthly Service Plan" $40.00
"Additional Local Airtime" $0.00
"Long Distance Charges" $22.40
"Roaming Charges" $0.00
"Total Taxes:" $7.49
"Total Current Charges:" $69.89
"MONTHLY SERVICE PLAN" 01-Oct-23 to 31-Oct-23
"Service Plan Name" "Total"
"Business SharePro 5GB Q1 offer (01-Oct-23 to 31-Oct-23)" $40.00
"Total Monthly Service Plan Charges" $40.00
"ADDITIONAL LOCAL AIRTIME"
"Service" "Total Airtime" "Free Airtime" "Included Airtime" "Chargeable Airtime" "Total"
"Phone (minutes)" 28:00 0:00 28:00 0:00 $0.00
"Total Additional Local Airtime Charges" $0.00
"LONG DISTANCE CHARGES"
"Service" "Total LD Minutes" "Free LD Minutes" "Included LD Minutes" "Chargeable LD Minutes" "Total"
"Domestic Phone" 28:00 0:00 0:00 28:00 $22.40
"Total Long Distance Charges" $22.40
"ROAMING"
"Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes" "Roaming LD Charges" "Roaming Surcharge" "Total"
"Total Roaming Charges" $0.00
"DO MORE DATA SERVICES"
"Service" "Total Events" "Event Type" "Total"
"Total Do More Data Services Charges" $0.00
"DO MORE VOICE SERVICES"
"Service" "Total Events" "Event Type" "Total"
"Total Do More Voice Services Charges" $0.00
"PAGER SERVICES"
"Service" "Total Messages" "Included Messages" "Chargeable Messages" "Total"
"Total Pager Charges" $0.00
"VALUE-ADDED SERVICES" 01-Oct-23 to 31-Oct-23
"Service" "Total"
"Can - Can/US LD $0.80/min (01-Oct-23 to 31-Oct-23)" $0.00
"Easy Roam INTL - $16/day Business (01-Oct-23 to 31-Oct-23)" $0.00
"Easy Roam US - $14/day Business (01-Oct-23 to 31-Oct-23)" $0.00
"UL Can - Can LD min (01-Oct-23 to 31-Oct-23)" $0.00
"UL domestic SMS / MMS (01-Oct-23 to 31-Oct-23)" $0.00
"Visual Voicemail (01-Oct-23 to 31-Oct-23)" $0.00
"Total Value Added Service Charges" $0.00
"OTHER CHARGES AND CREDIT"
"Charge or Credit" "Total"
"Total Other Charges and Credits" $0.00
"OTHER FEES"
"Service" "Total"
"Other Fees" $0.00
"TAXES"
"" "Total"
"GST" $3.12
"PST - BC" $4.37
"Total Taxes" $7.49
After tons of playing around, I used the following steps;
So, now I have the rows displaying as they should but here's the next problem. Not every user (C:) has the same number of rows (charges) so the data spills over many columns that are no longer where they should be.
Is there any way to solve for that? Is something like this more suited for Python?
Upvotes: 1
Views: 252
Reputation: 12146
See if this helps move you forward...
let
Source = Csv.Document(File.Contents("C:\5780859.txt"),[Delimiter="#(tab)", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Customer", each if [Column1] = "C:" then [Column2] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Customer"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each [Customer] <> null and [Customer] <> ""),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Column1] <> null and [Column1] <> ""),
#"Added Custom1" = Table.AddColumn(#"Filtered Rows1", "Section", each if [Column1] = Text.Upper([Column1]) then [Column1] else null),
#"Filled Down1" = Table.FillDown(#"Added Custom1",{"Section"}),
#"Filtered Rows2" = Table.SelectRows(#"Filled Down1", each ([Section] = "C:" or [Section] = "CURRENT CHARGES" or [Section] = "OTHER CHARGES AND CREDIT" or [Section] = "OTHER FEES" or [Section] = "TAXES")),
#"Filtered Rows3" = Table.SelectRows(#"Filtered Rows2", each ([Column2] <> null and [Column2] <> "") and ([Column1] = "C:" or [Column1] = "Subscriber Name:" or [Section] <> "C:"))
in
#"Filtered Rows3"
Upvotes: 3