How to select all columns in spark sql query in aggregation function

Question

Hi I am new to spark sql.

I have a query like this.

val highvalueresult = averageDF.select($"tagShortID", $"Timestamp", $"ListenerShortID", $"rootOrgID", $"subOrgID", $"RSSI_Weight_avg").groupBy("tagShortID", "Timestamp").agg(max($"RSSI_Weight_avg").alias("maxAvgValue"))

This prints only 3 columns.

tagShortID,Timestamp,maxAvgValue

But I want to display all the column along with this column.Any help or suggestion would be appreciated.

Daniel de Paula · Accepted Answer

One alternative, usually good for your specific case is to use Window Functions, because it avoids the need to join with the original data:

import org.apache.spark.expressions.Window
import org.apache.spark.sql.functions._

val windowSpec = Window.partitionBy("tagShortID", "Timestamp")

val result = averageDF.withColumn("maxAvgValue", max($"RSSI_Weight_avg").over(windowSpec))

You can find here a good article explaining the Window Functions functionality in Spark.

Please note that it requires either Spark 2+ or a HiveContext in Spark versions 1.4 ~ 1.6.

How to select all columns in spark sql query in aggregation function

Answers (2)

Related Questions