Python to Pyspark Regex: Converting Strings to list

Question

My code takes a string and extract elements within it to create a list.

Here is an example a string:

'["A","B"]'

Here is the python code:

df[column + '_upd'] = df[column].apply(lambda x: re.findall('\"(.*?)\"',x.lower()))

This results in a list that includes "A" and "B".

I'm brand new to pyspark and am a bit lost on how to do this. Ive seen people use regexp_extract but that doesn't quite apply to this problem.

Any help would be much appreciated

Answers (1)