Reputation: 155
I want to slice only alphanumeric word from column sentence using pyspark.
For Example,
Original text:
Expected results:
Upvotes: 0
Views: 109
Reputation: 26676
Please extract text between the white space.
df.withColumn('newtext', F.regexp_extract('text','\s(.*?)\s',0)).show()
+---+----------------+-------+
| id| text|newtext|
+---+----------------+-------+
| 1|ABCD AB12C BCDEF| AB12C |
+---+----------------+-------+
Followingg your revised question. Extract as ordered;
df.withColumn('newtext', F.regexp_extract('text','([A-Za-z]+\d+[A-Za-z]+|[A-Za-z]+\d+|\d+[A-Za-z]+)',0)).show()
+---+------------------+-------+
| id| text|newtext|
+---+------------------+-------+
| 1| ABCD AB12C BCDEF| AB12C|
| 2|SE2DC WERDF EWSQSA| SE2DC|
| 3| REDC SEDX WSDR12 | WSDR12|
+---+------------------+-------+
Upvotes: 2