Pyspark extracting exactly 4 consecutive numeric digit from a column and return it in a new column

Question

I am very new in using pyshark and have no idea how to do it

I am trying to extract from a title column.

Some value in the title column are:

Under Ground2(1990) Waterword(1995) Incredible Skate (1991) board That girl 2002” I am trying to get:

1990 1995 1991 2002

This is what i have tried :

import pyspark.sql.functions as F
from pyspark.sql.functions import split
from pyspark.sql.functions import      regexp_replace

movies_DF=movies_DF.withColumn('title',   regexp_replace(movies_DF.title, "$",""))
movies_DF=movies_DF.withColumn('title', regexp_replace(movies_DF.title, "$",""))
movies_DF=movies_DF.withColumn('yearOfRelease',(f.expr('substring(title,-4)')))

My output column that have:

1990

1995

board

2002”

dible

Pyspark extracting exactly 4 consecutive numeric digit from a column and return it in a new column

Answers (1)

Related Questions