How to use split function in scala with an iteration on a column?

Question

There will be one or more pairs of values in the column1 attribute, separated by semi-colons. For each pair of values, take the one to the right of the hyphen and generate a record for that CompanyCode. For example, if column1 = GT01-5636;GT01-7212, records would be generated for CompanyCode 5636 and 7212. All attributes other than CompanyCode are duplicated.

+--------------------
|       unparsed_data|
+--------------------
|GT01-5636;GT02-7212
|Gx01-5626;GY01-1112;GL01-4336;GT09-0012

I want my data in this form:

+--------------------
|CompanyCode
+--------------------
|5636                |
|7212                |
|5626                |
|1112                |
|4336                |
|0012                |

过过招 · Accepted Answer

You can use the regexp_extract_all function to extract all eligible strings, and then use the explode function to expand them.

val df1 = df.selectExpr("explode(regexp_extract_all(unparsed_data, '(.?+)-(\\d+)', 2)) as CompanyCode")
df1.show()

How to use split function in scala with an iteration on a column?

Answers (2)

Related Questions