Jarad
Jarad

Reputation: 18953

Transform categorical column into dummy columns using Power Query M

Using Power Query "M" language, how would you transform a categorical column containing discrete values into multiple "dummy" columns? I come from the Python world and there are several ways to do this but one way would be below:

>>> import pandas as pd
>>> dataset = pd.DataFrame(list('ABCDACDEAABADDA'),
               columns=['my_col'])
>>> dataset
   my_col
0       A
1       B
2       C
3       D
4       A
5       C
6       D
7       E
8       A
9       A
10      B
11      A
12      D
13      D
14      A
>>> pd.get_dummies(dataset)
    my_col_A  my_col_B  my_col_C  my_col_D  my_col_E
0          1         0         0         0         0
1          0         1         0         0         0
2          0         0         1         0         0
3          0         0         0         1         0
4          1         0         0         0         0
5          0         0         1         0         0
6          0         0         0         1         0
7          0         0         0         0         1
8          1         0         0         0         0
9          1         0         0         0         0
10         0         1         0         0         0
11         1         0         0         0         0
12         0         0         0         1         0
13         0         0         0         1         0
14         1         0         0         0         0

Upvotes: 2

Views: 5479

Answers (1)

Alexis Olson
Alexis Olson

Reputation: 40224

Interesting question. Here's an easy, scalable method I've found:


  1. Create a custom column of all ones (Add Column > Custom Column > Formula = 1).

  2. Add an index column (Add Column > Index Column).

  3. Pivot on the custom column (select my_col > Transform > Pivot Column).

  4. Replace null values with 0 (select all columns > Transform > Replace Values).


Here's what the M code looks like for this process:

    #"Added Custom" = Table.AddColumn(#"Previous Step", "Custom", each 1),
    #"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1),
    #"Pivoted Column" = Table.Pivot(#"Added Index", List.Distinct(#"Added Index"[my_col]), "my_col", "Custom"),
    #"Replaced Value" = Table.ReplaceValue(#"Pivoted Column",null,0,Replacer.ReplaceValue,Table.ColumnNames(#"Pivoted Column"))

Once you've completed the above, you can remove the index column if desired.

Upvotes: 3

Related Questions