Creating a column based on a list or dictionary

Question

I am new to PySpark, I have a dataframe containing a customer id and text, with an associated value

+------+-----+------+
|id    |text |value |
+------+-----+------+
|   1  |  Cat|     5|
|   1  |  Dog|     4|
|   2  |  Oil|     1|

I would like to parse the text column, based on a list of keywords, and create a column that tells me wether a keyword is in the text field and extract the associated value, the expected result is this

List_keywords = ["Dog", Cat"]

Out

+------+-----+------+--------+---------+--------+---------+
|id    |text |value |bool_Dog|value_Dog|bool_cat|value_cat|
+------+-----+------+--------+---------+--------+---------+
|   1  |  Cat|     5|0       |        0|       1|        5|
|   1  |  Dog|     4|1       |        4|       0|        0|
|   2  |  Oil|     1|0       |        0|       0|        0|

What is the best way to do that? I was thinking of creating a list or a dictionary containing my keywords, and parsing it with a for loop, but I'm sure there is a better way to do that.

Creating a column based on a list or dictionary

Answers (1)

Related Questions