Reputation: 285
This is the field contains list.
+--------------------+
| categoryPathId|
+--------------------+
|[summer|Summer, w...|
|[ab|ba, caa|da] |
| []|
|[shop-all|Shop Al...|
+--------------------+
The each and every value of the list contains two values separated with pipe symbol(|).
It will be like this [ab|ba, caa|da]. I want to remove the second word (i.e. after pipe symbol) in each and every value of the list. The expected result like this [ab,caa].
Can you help me to solve this...
Upvotes: 1
Views: 158
Reputation: 5536
Spark2.4+
You can use higher order function to perform this operation
from pyspark.sql.functions import *
df = df.select(expr('''transform(categoryPathId, x->split(x,'\\\\|')[0])''').alias('categoryPathId1'))
df.show()
+---------------+
|categoryPathId1|
+---------------+
| [a, c] |
| [a, c] |
| [a, c] |
+---------------+
Upvotes: 2