Reputation: 861
I have a json dataset, and it is formated as:
val data = spark.read.json("user.json").select("user_id","friends").show()
+--------------------+--------------------+
| user_id| friends|
+--------------------+--------------------+
|18kPq7GPye-YQ3LyK...|[rpOyqD_893cqmDAt...|
|rpOyqD_893cqmDAtJ...|[18kPq7GPye-YQ3Ly...|
|4U9kSBLuBDU391x6b...|[18kPq7GPye-YQ3Ly...|
|fHtTaujcyKvXglE33...|[18kPq7GPye-YQ3Ly...|
+--------------------+--------------------+
data: org.apache.spark.sql.DataFrame = [user_id: string, friends: array<string>]
How can I transform it to [user_id: String, friend: String], eg:
+--------------------+--------------------+
| user_id| friend|
+--------------------+--------------------+
|18kPq7GPye-YQ3LyK...| rpOyqD_893cqmDAt...|
|18kPq7GPye-YQ3LyK...| 18kPq7GPye-YQ3Ly...|
|4U9kSBLuBDU391x6b...| 18kPq7GPye-YQ3Ly...|
|fHtTaujcyKvXglE33...| 18kPq7GPye-YQ3Ly...|
+--------------------+--------------------+
How can I get this dataframe?
Upvotes: 5
Views: 11900
Reputation: 23099
You can use concat_ws function to concat the array of string and get only a string
data.withColumn("friends", concat_ws("",col("friends")))
concat_ws(java.lang.String sep, Column... exprs)
Concatenates multiple input string columns together into a single string column, using the given separator.
Or you can use simple udf to convert array to string as below
import org.apache.spark.sql.functions._
val value = udf((arr: Seq[String]) => arr.mkString(" "))
val newDf = data.withColumn("hobbies", value($"friends"))
If you are trying to get values of array for user then you can use explode method as
data.withColumn("friends", explode($"friends"))
explode(Column e) Creates a new row for each element in the given array or map column.
If you are trying to get only one data then, as @ramesh suggested you can get first element as
data.withColumn("friends", $"friends"(0))
Hope this helps!
Upvotes: 5