Leonard
Leonard

Reputation: 165

Pyspark: Split and select part of the string column values

How can I select the characters or file path after the Dev\” and dev\ from the column in a spark DF?

Sample rows of the pyspark column:

\\D\Dev\johnny\Desktop\TEST
\\D\Dev\matt\Desktop\TEST\NEW
\\D\Dev\matt\Desktop\TEST\OLD\TEST
\\E\dev\peter\Desktop\RUN\SUBFOLDER\New

Expected Output

johnny\Desktop\TEST
matt\Desktop\TEST\NEW
matt\Desktop\TEST\OLD\TEST
peter\Desktop\RUN\SUBFOLDER\New

I tried to use the code below.

df = df.withColumn(
        "sub_path",
        F.element_at(F.split(F.col("path"), "Dev\\\\"), -1)
    )

It's only giving the part correct results that I want. Appreciate someone can help.

Upvotes: 0

Views: 1556

Answers (1)

ggordon
ggordon

Reputation: 10035

The following modification [Dd] matches both upper and lower case d.

df = df.withColumn(
        "sub_path",
        F.element_at(F.split(F.col("path"), "[Dd]ev\\\\"), -1)
    )

Let me know if this works for you.

Upvotes: 1

Related Questions