user1914644
user1914644

Reputation: 107

How to create a column contains words from a string?

I have a dataset with a column(string) containing several titles. If I wanted to do an operation, for example, where I create a new column (colB) doing an operation with contains or like (eg like("%hello%") ), so that if it had discounts in the string it would fill with a discount in colB or of have hello in the string would fill with hello in colB. And if I had both, I would put them both separated by a comma.

Example of the desired table below

colA colB
Hello World hello
Hi World null
Discounts for apples discount
Check this discount! discount
Hello World and discount! discount, hello

How do I go around and create this kind of dataset? Many thanks in advance!

Upvotes: 0

Views: 148

Answers (1)

pltc
pltc

Reputation: 6082

You can use concat_ws to add your text together

from pyspark.sql import functions as F

(df
    .withColumn('b', F.concat_ws(', ', F.array(
        F.when(F.lower('a').like('%hello%'), F.lit('hello')),
        F.when(F.lower('a').like('%discount%'), F.lit('discount')),
    )))
    .show(10, False)
)

+-------------------------+---------------+
|a                        |b              |
+-------------------------+---------------+
|Hello World              |hello          |
|Hi World                 |               |
|Discounts for apples     |discount       |
|Check this discount!     |discount       |
|Hello World and discount!|hello, discount|
+-------------------------+---------------+

Upvotes: 1

Related Questions