PySpark regex to get value between a string and hyphen

Question

I am trying to extract the number between the string “line_number:” and hyphen. I am struggling with generating a regex/substring for the same in PySpark. Below is my input data in a column called “whole_text”. The string “line_number:” will always be in each row followed by the number and hyphen. Is there any way I can find the text “line_number:” and first hyphen after that and extract the number in between?

The output should be 121, 3112 and so on in a new column.

Please help.

text:ABC12637-XYZ  line_number:121-ABC:JJ11
header:3AXYZ166-LMN  line_number:3112-GHI:3A1

PySpark regex to get value between a string and hyphen

Answers (1)

Related Questions