user19814628
user19814628

Reputation: 37

How to remove substring in pyspark

I want to takeout any value which is before (Impressions). Ex if i have value YouTube TrueView for Reach (Impressions), I will need YouTube TrueView for Reach.

Another example is YouTube Bumper (Impressions) --> YouTube Bumper

I am currently using :

validated_df=validated_df.withColumn("MediaNm", when(col("MediaNm").like("%Impressions%"),F.regexp_extract(F.col("MediaNm"), r".*?\(", 0)).otherwise(validated_df.MediaNm))

I am getting blank as a result of this.

Upvotes: 1

Views: 956

Answers (1)

Ric S
Ric S

Reputation: 9277

If I understood correctly, you just want to remove the string ' (Impressions)': for this, you just need a regexp_replace

validated_df.withColumn('MediaNm', F.regexp_replace('MediaNm', ' \(Impressions\)', ''))

+--------------------------+
|MediaNm                   |
+--------------------------+
|YouTube TrueView for Reach|
|YouTube Bumper            |
+--------------------------+

Upvotes: 1

Related Questions