lunbox
lunbox

Reputation: 363

remove all characters apart from number in pyspark

I have a column called cola which has string data type example "100.z" or "102c"

How to I get rid of all letters or any characters apart from the numbers so cola becomes "100" or "102"

df.withColumn('cola', regexp_replace('cola', 'charsgohere', ''))

Upvotes: 0

Views: 846

Answers (1)

Derek O
Derek O

Reputation: 19565

You can use the regex [^0-9] to match any non-digit. For example:

df.withColumn('cola_cleaned', F.regexp_replace('cola', '[^0-9]', ''))

Result:

+------+------------+
|  cola|cola_cleaned|
+------+------------+
| 100.z|         100|
|  102c|         102|
|x1022-|        1022|
+------+------------+

Upvotes: 1

Related Questions