Pi Pi
Pi Pi

Reputation: 861

How to deal with array<String> in spark dataframe?

I have a json dataset, and it is formated as:

val data = spark.read.json("user.json").select("user_id","friends").show()
+--------------------+--------------------+
|             user_id|             friends|
+--------------------+--------------------+
|18kPq7GPye-YQ3LyK...|[rpOyqD_893cqmDAt...|
|rpOyqD_893cqmDAtJ...|[18kPq7GPye-YQ3Ly...|
|4U9kSBLuBDU391x6b...|[18kPq7GPye-YQ3Ly...|
|fHtTaujcyKvXglE33...|[18kPq7GPye-YQ3Ly...|
+--------------------+--------------------+
data: org.apache.spark.sql.DataFrame = [user_id: string, friends: array<string>]

How can I transform it to [user_id: String, friend: String], eg:

+--------------------+--------------------+
|             user_id|             friend|
+--------------------+--------------------+
|18kPq7GPye-YQ3LyK...| rpOyqD_893cqmDAt...|
|18kPq7GPye-YQ3LyK...| 18kPq7GPye-YQ3Ly...|
|4U9kSBLuBDU391x6b...| 18kPq7GPye-YQ3Ly...|
|fHtTaujcyKvXglE33...| 18kPq7GPye-YQ3Ly...|
+--------------------+--------------------+

How can I get this dataframe?

Upvotes: 5

Views: 11900

Answers (1)

koiralo
koiralo

Reputation: 23099

You can use concat_ws function to concat the array of string and get only a string

data.withColumn("friends", concat_ws("",col("friends")))

concat_ws(java.lang.String sep, Column... exprs) Concatenates multiple input string columns together into a single string column, using the given separator.

Or you can use simple udf to convert array to string as below

 import org.apache.spark.sql.functions._

 val value = udf((arr: Seq[String]) => arr.mkString(" "))

 val newDf = data.withColumn("hobbies", value($"friends"))

If you are trying to get values of array for user then you can use explode method as

data.withColumn("friends", explode($"friends"))

explode(Column e) Creates a new row for each element in the given array or map column.

If you are trying to get only one data then, as @ramesh suggested you can get first element as

data.withColumn("friends", $"friends"(0))

Hope this helps!

Upvotes: 5

Related Questions