Pyspark: Split and select part of the string column values

Question

How can I select the characters or file path after the Dev\” and dev\ from the column in a spark DF?

Sample rows of the pyspark column:

\D\Dev\johnny\Desktop\TEST
\D\Dev\matt\Desktop\TEST\NEW
\D\Dev\matt\Desktop\TEST\OLD\TEST
\E\dev\peter\Desktop\RUN\SUBFOLDER\New

Expected Output

johnny\Desktop\TEST
matt\Desktop\TEST\NEW
matt\Desktop\TEST\OLD\TEST
peter\Desktop\RUN\SUBFOLDER\New

I tried to use the code below.

df = df.withColumn(
        "sub_path",
        F.element_at(F.split(F.col("path"), "Dev\\"), -1)
    )

It's only giving the part correct results that I want. Appreciate someone can help.

ggordon · Accepted Answer

The following modification [Dd] matches both upper and lower case d.

df = df.withColumn(
        "sub_path",
        F.element_at(F.split(F.col("path"), "[Dd]ev\\"), -1)
    )

Let me know if this works for you.

Answers (1)