Reputation: 105
I extract data automatically from printed tables into Excel, and generally the solution I use is pretty good. However, because the tables are very complex, it sometimes merges several rows into one due to failed layout recognition. So, it might look like this (| indicates new cell):
Label1|A1|B1|C1
Label2|A2|B2|C2
Label3^Label4|A3^A4|B3^B4|C3^C4
Label5|A5|B5|C5
of course, I need 5 rows:
Label1|A1|B1|C1
Label2|A2|B2|C2
Label3|A3|B3|C3
Label4|A4|B4|C4
Label5|A5|B5|C5
Now, I could do it manually, but: multiply that with tens of thousands of cases... I'm in dire need of an automatable solution - any suggestions?
Upvotes: 0
Views: 234
Reputation: 60344
Some assumptions from your data:
^
^
's other than those used to separate these grouped rows.You can obtain your desired output using Power Query
, available in Windows Excel 2010+ and Office 365 Excel
Data => Get&Transform => From Table/Range
or From within sheet
Home => Advanced Editor
Applied Steps
window, to better understand the algorithm and stepsM Code
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
//group by "Column 1"
// This assumes there are no duplicates in Column 1, as shown in your example
#"Grouped Rows" = Table.Group(#"Changed Type", {"Column1"}, {
{"splitRows",(t)=>
let
colNames = Table.ColumnNames(t),
//generate lists of new rows, splitting on the ^
newRows = List.Generate(
()=>[nr=Text.Split(Table.Column(t, colNames{0}){0}, "^"), idx = 0],
each [idx]<List.Count(colNames),
each [nr=Text.Split(Table.Column(t, colNames{[idx]+1}){0}, "^"), idx = [idx]+1],
each [nr]),
newTable = Table.FromColumns(newRows,colNames)
in
newTable}
}),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Column1"}),
#"Expanded splitRows" = Table.ExpandTableColumn(#"Removed Columns", "splitRows", {"Column1", "Column2", "Column3", "Column4"}, {"Column1", "Column2", "Column3", "Column4"})
in
#"Expanded splitRows"
Upvotes: 2