Encode column of lists into integer in pyspark

Question

I have a pyspark dataframe like this one:

 --------------------
| id | configuration |
|----|---------------|
| 1  | [c1, c2, a1]  |
| 2  | [c1, c2, a1]  |
| 3  | [z1, x6, a8]  |
 --------------------

I want to encode the configuration column into a column of integer, the following is the desired dataframe:

 -----------------------------
| id | configuration | labels |
|----|---------------|--------|
| 1  | [c1, c2, a1]  |    1   |
| 2  | [c1, c2, a1]  |    1   |
| 3  | [z1, x6, a8]  |    2   |
 -----------------------------

How can i perform this operation?

Encode column of lists into integer in pyspark

Answers (1)

Related Questions