Reputation:
I have a dataframe like this :
userId someString varA varB
1 "example1" 0,2,5 1,2,9
2 "example2" 1,20,5 9,null,6
i want to convert the data into VarA and varB to an array of String
userId someString varA varB
1 "example1" [0,2,5] [1,2,9]
2 "example2" [1,20,5] [9,null,6]
Upvotes: 1
Views: 48
Reputation: 5700
Its fairly Simple. you can use sql split function.
import org.apache.spark.sql.functions.split
df.withColumn("varA", split($"varA",",")).withColumn("varB", split($"varB",",")).show()
Output
+------+----------+----------+------------+
|userId|someString| varA| varB|
+------+----------+----------+------------+
| 1| example1| [0, 2, 5]| [1, 2, 9]|
| 2| example2|[1, 20, 5]|[9, null, 6]|
+------+----------+----------+------------+
Upvotes: 3