Lawrance Amburose
Lawrance Amburose

Reputation: 155

Slice Alphanumeric word from column sentence using pyspark

I want to slice only alphanumeric word from column sentence using pyspark.

For Example,

Original text:

enter image description here

Expected results:

enter image description here

Upvotes: 0

Views: 109

Answers (1)

wwnde
wwnde

Reputation: 26676

Please extract text between the white space.

df.withColumn('newtext', F.regexp_extract('text','\s(.*?)\s',0)).show()

+---+----------------+-------+
| id|            text|newtext|
+---+----------------+-------+
|  1|ABCD AB12C BCDEF| AB12C |
+---+----------------+-------+

Followingg your revised question. Extract as ordered;

df.withColumn('newtext', F.regexp_extract('text','([A-Za-z]+\d+[A-Za-z]+|[A-Za-z]+\d+|\d+[A-Za-z]+)',0)).show()

+---+------------------+-------+
| id|              text|newtext|
+---+------------------+-------+
|  1|  ABCD AB12C BCDEF|  AB12C|
|  2|SE2DC WERDF EWSQSA|  SE2DC|
|  3| REDC SEDX WSDR12 | WSDR12|
+---+------------------+-------+

Upvotes: 2

Related Questions