Reputation: 45
I have the following data frame
+----+-------+
|item| path|
+----+-------+
| -a-| a-b-c|
| -b-| e-b-f|
| -d-|e-b-d-h|
| -c-| g-h-c|
+----+-------+
i want it to split path column with value of the item column in the same index
+----+--------+
|item| path|
+----+--------+
| -b-| [a, c]|
| -b-| [e, f]|
| -d-|[e-b, h]|
| -c-|[g-h, b]|
+----+--------+
i've used this udf function
split_udf = udf(lambda a,b: a.split(b),T.ArrayType(T.StringType()))
org = org.withColumn('crb_url', split_udf('path','item')[0])
it worked very well But, i was wondering if there's another way to do it with pyspark function because i can't use in anyway the "org" to join with another dataframe or save it as a delta table it gives me this error
AttributeError: 'NoneType' object has no attribute 'split'
Upvotes: 0
Views: 61
Reputation: 669
using .fillna("")
to fill null value to "". Like this:org = org.fillna("").withColumn('crb_url', split_udf('path','item')[0])
Upvotes: 1