Reputation: 363
I have a column called cola which has string data type example "100.z" or "102c"
How to I get rid of all letters or any characters apart from the numbers so cola becomes "100" or "102"
df.withColumn('cola', regexp_replace('cola', 'charsgohere', ''))
Upvotes: 0
Views: 846
Reputation: 19565
You can use the regex [^0-9]
to match any non-digit. For example:
df.withColumn('cola_cleaned', F.regexp_replace('cola', '[^0-9]', ''))
Result:
+------+------------+
| cola|cola_cleaned|
+------+------------+
| 100.z| 100|
| 102c| 102|
|x1022-| 1022|
+------+------------+
Upvotes: 1