Reputation: 374
I want to convert a Spark DataFrame into another DataFrame with a specific manner as follows:
I have Spark DataFrame:
+---------+------------+
|protocol | count |
+---------+------------+
| TCP| 8231 |
| ICMP| 7314 |
| UDP| 5523 |
| IGMP| 4423 |
| EGP| 2331 |
+---------+------------+
And I want to turn it into:
+----------------------------------------------------------+
|Aggregated |
+----------------------------------------------------------+
|{tcp: 8231, icmp: 7314, udp: 5523, igmp: 4423, egp: 2331} |
+----------------------------------------------------------+
The aggregated column can be both list or map, or string. Is this possible with DataFrame functions or do I need to create my own udf to aggregate this ?
Upvotes: 0
Views: 1723
Reputation: 307
Concat columns in dataframe and create a new column:
var new_df = df.withColumn("concat", concat($"protocol", lit(" : "), $"count"))
To aggregate it into a single row as a list you can do this.
var new_df = new_df.groupBy().agg(collect_list("concat").as("aggregated"))
new_df.show
If you want to get the data into a string instead of dataframe, you can collect it as following.
new_df.select("concat").collect.map(x=> x.get(0)).mkString("{", ",", "}")
Upvotes: 0
Reputation: 35219
pivot
and toJSON
will give you what you need
import org.apache.spark.sql.functions.first
df.groupBy().pivot("protocol").agg(first("count")).toJSON.show(false)
// +----------------------------------------------------------+
// |value |
// +----------------------------------------------------------+
// |{"EGP":2331,"ICMP":7314,"IGMP":4423,"TCP":8321,"UDP":5523}|
// +----------------------------------------------------------+
Upvotes: 2
Reputation: 28322
Since you want to convert all columns to a single one and it does not seem to be many columns to begin with, you can collect
the dataframe to the driver and use pure Scala code to convert it into the format you want.
The following will give you a Array[String]
:
val res = df.as[(String, Int)].collect.map{case(protocol, count) => protocol + ": " + count}
To convert it into a single string, simply do:
val str = res.mkString("{", ", ", "}")
Upvotes: 0