Reputation: 18953
Using Power Query "M" language, how would you transform a categorical column containing discrete values into multiple "dummy" columns? I come from the Python world and there are several ways to do this but one way would be below:
>>> import pandas as pd
>>> dataset = pd.DataFrame(list('ABCDACDEAABADDA'),
columns=['my_col'])
>>> dataset
my_col
0 A
1 B
2 C
3 D
4 A
5 C
6 D
7 E
8 A
9 A
10 B
11 A
12 D
13 D
14 A
>>> pd.get_dummies(dataset)
my_col_A my_col_B my_col_C my_col_D my_col_E
0 1 0 0 0 0
1 0 1 0 0 0
2 0 0 1 0 0
3 0 0 0 1 0
4 1 0 0 0 0
5 0 0 1 0 0
6 0 0 0 1 0
7 0 0 0 0 1
8 1 0 0 0 0
9 1 0 0 0 0
10 0 1 0 0 0
11 1 0 0 0 0
12 0 0 0 1 0
13 0 0 0 1 0
14 1 0 0 0 0
Upvotes: 2
Views: 5479
Reputation: 40224
Interesting question. Here's an easy, scalable method I've found:
Create a custom column of all ones (Add Column > Custom Column > Formula = 1).
Add an index column (Add Column > Index Column).
Pivot on the custom column (select my_col
> Transform > Pivot Column).
Replace null values with 0 (select all columns > Transform > Replace Values).
Here's what the M code looks like for this process:
#"Added Custom" = Table.AddColumn(#"Previous Step", "Custom", each 1),
#"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1),
#"Pivoted Column" = Table.Pivot(#"Added Index", List.Distinct(#"Added Index"[my_col]), "my_col", "Custom"),
#"Replaced Value" = Table.ReplaceValue(#"Pivoted Column",null,0,Replacer.ReplaceValue,Table.ColumnNames(#"Pivoted Column"))
Once you've completed the above, you can remove the index column if desired.
Upvotes: 3