Reputation: 165
How can I select the characters or file path after the Dev\”
and dev\
from the column in a spark DF?
Sample rows of the pyspark column:
\\D\Dev\johnny\Desktop\TEST
\\D\Dev\matt\Desktop\TEST\NEW
\\D\Dev\matt\Desktop\TEST\OLD\TEST
\\E\dev\peter\Desktop\RUN\SUBFOLDER\New
Expected Output
johnny\Desktop\TEST
matt\Desktop\TEST\NEW
matt\Desktop\TEST\OLD\TEST
peter\Desktop\RUN\SUBFOLDER\New
I tried to use the code below.
df = df.withColumn(
"sub_path",
F.element_at(F.split(F.col("path"), "Dev\\\\"), -1)
)
It's only giving the part correct results that I want. Appreciate someone can help.
Upvotes: 0
Views: 1556
Reputation: 10035
The following modification [Dd]
matches both upper and lower case d
.
df = df.withColumn(
"sub_path",
F.element_at(F.split(F.col("path"), "[Dd]ev\\\\"), -1)
)
Let me know if this works for you.
Upvotes: 1