Reputation: 11
I have the following input table:
Sales Order | Asset Serial Number | Asset Model | Licence Class | License Type | License Name | Account Name |
---|---|---|---|---|---|---|
10000 | 1234, 5643, 3463 | test-pro | A123 | software | LIC-0002, LIC-0188, LIC-0188, LIC-0013 | ABC |
2000 | 5678, 9846, 5639 | test-pro | A123 | software | LIC-00107, LIC-08608, LIC-009, LIC-0610 | ABC |
Here the screenshot
I need it transformed into form:
.
I tried it first with the Replace function & transponate it but I didn't find a way to add the other empty columns other than do it manually. My second thought was the text-to-column function, didn't work either.
Upvotes: 1
Views: 228
Reputation: 6759
Here two solutions one using Excel formulas and the other one using Power Query. See Explanation section for more information about each approach:
It is possible with excel without using Power Query, but several manipulations are required. On cell I2
put the following formula:
=LET(counts, BYROW(F2:F3, LAMBDA(a, LEN(a) - LEN(SUBSTITUTE(a, ",", "")))), del, "|",
emptyRowsSet, MAP(A2:A3, B2:B3, C2:C3, D2:D3, E2:E3, F2:F3, G2:G3, counts,
LAMBDA(a,b,c,d,e,f,g,cnts, LET(rep, REPT(";",cnts),a&rep &del& b&rep &del& c&rep &del&
d&rep &del& e&rep &del& SUBSTITUTE(f,", ",";") &del& g&rep ))),
emptyRowsSetByCol, TEXTSPLIT(TEXTJOIN("&",,emptyRowsSet), del, "&"),
byColResult, BYCOL(emptyRowsSetByCol, LAMBDA(a, TEXTJOIN(";",,a))),
singleLine, TEXTJOIN(del,,byColResult),
TRANSPOSE(TEXTSPLIT(singleLine,";",del))
)
Here is the output:
A simplified version of previous formula is the following one:
=LET(counts, BYROW(F2:F3, LAMBDA(a, LEN(a) - LEN(SUBSTITUTE(a, ",", "")))), del, "|",
reps, MAKEARRAY(ROWS(A2:G3),COLUMNS(A2:G3), LAMBDA(a,b, INDEX(counts, a,1))),
emptyRowsSetByCol, MAP(A2:G3, reps, LAMBDA(a,b, IF(COLUMN(a)=6,
SUBSTITUTE(a,", ",";"), a&REPT(";",b)))),
byColResult, BYCOL(emptyRowsSetByCol, LAMBDA(a, TEXTJOIN(";",,a))),
singleLine, TEXTJOIN(del,,byColResult),
TRANSPOSE(TEXTSPLIT(singleLine,";",del))
)
The following M Code provides the expected result:
let
Source = Excel.CurrentWorkbook(){[Name="TB_Sales"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Sales Order", type text}}),
#"Split License Name" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"License Name",
Splitter.SplitTextByDelimiter(", ", QuoteStyle.Csv),
let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "License Name"),
ListOfColumns = List.Difference(Table.ColumnNames(#"Split License Name"), {"License Name"}),
RemainingColumns = List.Difference(Table.ColumnNames(#"Changed Type"), ListOfColumns),
RemoveDups = (lst as list) =>
let
concatList = (left as list, right as list) => List.Transform(List.Positions(left), each left{_}&"_"& right{_}),
prefixList = Table.Column(#"Split License Name", "Sales Order"),
tmp = concatList(prefixList, lst),
output = List.Accumulate(tmp, {}, (x, y) => x & {if List.Contains(x, y) then null else y})
in
output,
replaceValues = List.Transform(ListOfColumns, each RemoveDups(Table.Column(#"Split License Name", _))),
#"Added Empty Rows" = Table.FromColumns(
replaceValues & Table.ToColumns(Table.SelectColumns(#"Split License Name", RemainingColumns)),
ListOfColumns & RemainingColumns),
#"Extracted Text After Delimiter" = Table.TransformColumns(#"Added Empty Rows", {{"Sales Order",
each Text.AfterDelimiter(_, "_"), type text}, {"Asset Serial Number", each Text.AfterDelimiter(_, "_"), type text},
{"Asset Model", each Text.AfterDelimiter(_, "_"), type text}, {"Licence Class",
each Text.AfterDelimiter(_, "_"), type text}, {"License Type", each Text.AfterDelimiter(_, "_"), type text},
{"Account Name", each Text.AfterDelimiter(_, "_"), type text}}),
#"Reordered Columns" = Table.ReorderColumns(#"Extracted Text After Delimiter",{"Sales Order", "Asset Serial Number", "Asset Model",
"Licence Class", "License Type", "License Name", "Account Name"})
in
#"Reordered Columns"
And here is the output:
And the corresponding Excel Output:
Here we provide the explanation for each approach: Excel formula and Power Query.
We need to calculate how many empty rows we need to add based on License Name column values. We achieve that via counts
name from LET
:
BYROW(F2:F3, LAMBDA(a, LEN(a) - LEN(SUBSTITUTE(a, ",", ""))))
The output for this case is: {3;3}
, i.e 2x1
array, which represents how many empty rows we need to add for each input row.
Next we need to build the set that includes empty rows. We name it emptyRowsSet
and the calculation is as follow:
MAP(A2:A3, B2:B3, C2:C3, D2:D3, E2:E3, F2:F3, G2:G3, counts,
LAMBDA(a,b,c,d,e,f,g,cnts,
LET(rep, REPT(";",cnts),a&rep &del& b&rep &del& c&rep &del&
d&rep &del& e&rep &del& SUBSTITUTE(f,", ",";") &del& g&rep)))
We use inside MAP
an additional LET
function to avoid repetition of rep
value. Because we want to consider the content of License Name as additional rows we replace the ,
by ;
(we are going to consider this token as a row delimiter). We use del
(|
) as a delimiter that will serve as a column delimiter.
Here would be the intermediate result of emptyRowsSet
:
10000;;;|1234, 5643, 3463;;;|test-pro;;;|A123;;;|software;;;|LIC-0002;LIC-0188;LIC-0188;LIC-0013|ABC;;;
2000;;;|5678, 9846, 5639;;;|test-pro;;;|A123;;;|software;;;|LIC-00107;LIC-08608;LIC-009;LIC-0610|ABC;;;
As you can see additional ;
where added per number of items we have in License Name column per row. In the sample data the number of empty rows to add is the same per row, but it could be different.
The rest is how to accommodate the content of emptyRowsSet
in the way we want. Because we cannot invoke TEXTSPLIT
and BYROW
together because we get #CALC!
(Nested Array error). We need to try to circumvent this.
For example the following produces an error (#CALC!
):
=BYROW(A1:A2,LAMBDA(a, TEXTSPLIT(a,"|")))
where the range A1:A2
has the following: ={"a|b";"c|d"}
. We don't get the desired output: ={"a","b";"c","d"}
. In short the output of BYROW
should be a single column so any LAMBDA
function that expands the columns will not work.
In order to do circumvent that we can do the following:
;
for example. Now we have column delimiter (|
) and row delimiter (;
)TEXTSPLIT
to generate the array (2x2
in this case), identifying the columns and the row via both delimiters.We can do it as follow (showing the output of each step on the right)
=TEXTSPLIT(TEXTJOIN(";",,A1:A2),"|",";") -> 1) "a|b;c|d" -> 2) ={"a","b";"c","d"}
We are using the same idea here (but using &
for joining each row). The name emptyRowsSetByCol
:
TEXTSPLIT(TEXTJOIN("&",,emptyRowsSet), del, "&")
Would produce the following intermediate result, now organized by columns (Table 1):
Sales Order | Asset Serial Number | Asset Model | License Class | License Type | License Name | Account Name |
---|---|---|---|---|---|---|
10000;;; |
1234, 5643, 3463;;; |
test-pro;;; |
A123;;; |
software;;; |
LIC-0002;LIC-0188;LIC-0188;LIC-0013 |
ABC;;; |
2000;;; |
5678, 9846, 5639;;; |
test-pro;;; |
A123;;; |
software;;; |
LIC-00107;LIC-08608;LIC-009;LIC-0610 |
ABC;;; |
Note: The header are just for illustrative purpose, but it is not part of the output.
Now we need to concatenate the information per column and for that we can use BYCOL
function. We name the result: byColResult
of the following formula:
BYCOL(emptyRowsSetByCol, LAMBDA(a, TEXTJOIN(";",,a)))
The intermediate result would be:
Sales Order | Asset Serial Number | Asset Model | License Class | License Type | License Name | Account Name |
---|---|---|---|---|---|---|
10000;;;;2000;;; |
1234, 5643, 3463;;;;5678, 9846, 5639;;; |
test-pro;;;;test-pro;;; |
A123;;;;A123;;; |
software;;;;software;;; |
LIC-0002;LIC-0188;LIC-0188;LIC-0013;LIC-00107;LIC-08608;LIC-009;LIC-0610 |
ABC;;;;ABC;;; |
1x7
array and on each column the content already delimited by ;
(ready for the final split).
Now we need to apply the same idea as before i.e. convert everything to a single string and then split it again.
First we convert everything to a single string and name the result: singleLine
:
TEXTJOIN(del,,byColResult)
Next we need to do the final split:
TRANSPOSE(TEXTSPLIT(singleLine,";",del))
We need to transpose the result because SPLIT
processes the information row by row.
I provided a simplified version of the initial approach which requires less steps, because we can obtain the result of the MAP
function directly by columns.
The main idea is to treat the input range A2:G3
all at once. In order to do that we need to have all the MAP
input arrays of the same shape. Because we need to take into account the number of empty rows to add (;
), we need to build this second array of the same shape. The name reps
, is intended to create this second array as follow:
MAKEARRAY(ROWS(A2:G3),COLUMNS(A2:G3),
LAMBDA(a,b, INDEX(counts, a,1)))
The intermediate output will be:
3|3|3|3|3|3|3
3|3|3|3|3|3|3
which represents a 2x7
array, where on each row we have the number of empty rows to add.
Now the name emptyRowsSetByCol
:
MAP(A2:G3, reps,
LAMBDA(a,b, IF(COLUMN(a)=6, SUBSTITUTE(a,", ",";"),
a&REPT(";",b))))
Produces the same intermediate result as in above Table 1. We treat different the information from column 6 (License Name) replacing the ,
with ;
. For other columns just add as many ;
as empty rows we need to add for each input row. The rest of the formula is just similar to the first approach.
#"Split License Name"
is a standard Power Query (PQ) UI function: Split Column by Delimiter.
To generate empty rows we do it by removing duplicates elements on each column that requires this transformation, i.e. all columns except License Name. We do it all at once identifying the columns that require such transformation. In order to do that we define two lists:
ListOfColumns
: Identifies the columns we are going to do the transformation, because we need to do it in all columns except for License Name. We do it by difference via the PQ function: List.Difference()
.RemainingColumns
: To build back again the table, we need to identify the columns don't require such transformation. We use same idea via List.Difference()
, based on ListOfColumns
list.The user defined function RemoveDups(lst as list)
does the magic of this transformation.
Because we need to remove duplicates, but having unique elements based on each initial row, we use the first column Sales Order as a prefix, so we can "clean" the column within each partition.
In order to do that we define inside of RemoveDups()
function a new user defined function concatList()
to add the first column as prefix.
concatList = (left as list, right as list) =>
List.Transform(List.Positions(left), each left{_}&"-"& right{_}),
we concatenate each element of the lists (row by row) using a underscore delimiter (_
). Later we are going to use this delimiter to remove the first column as prefix added at this point.
To remove duplicates and replace them with null
we use the following logic:
output = List.Accumulate(tmp, {}, (x, y) =>
x & {if List.Contains(x, y) then null else y})
where tmp
is a modified list (lst
) with the first column as prefix.
Now we invoke the List.Transform()
function for all the columns that require the transformation using as transform
(second input argument) the function we just defined previously:
replaceValues = List.Transform(ListOfColumns, each
RemoveDups(Table.Column(#"Split License Name", _))),
#"Added Empty Rows"
represents the step of this calculation and the output will be the following table:
The step #"Extracted Text After Delimiter"
is just to remove the prefix we added and for that we use standard PQ UI Transform->Extract->Text After Delimiter.
Finally we need to reorder the column to put in a way it is expected via the step: #"Reordered Columns"
using PQ UI functionality.
Upvotes: 1