user9272398
user9272398

Reputation:

How to remove punctuation from a text?

I have a very big data set . I am wondering How I can remove all punctuation from a big dataset in pyspark? For example , . & \ | - _

Upvotes: 1

Views: 1518

Answers (1)

mck
mck

Reputation: 42352

You can use regexp_replace to remove the punctuations you specified using a regex expression:

import pyspark.sql.functions as F

df2 = df.select(
    [F.regexp_replace(col, r',|\.|&|\\|\||-|_', '').alias(col) for col in df.columns]
)

Upvotes: 1

Related Questions