newleaf
newleaf

Reputation: 2457

spark groupby on several columns at same time

I’m using Spark2.0 I have a dataframe having several columns like id, latitude, longitude, time, I want to do a groupby and keep [“latitude”,” longitude”] always together,

Could I do the following?

df.groupBy('id',[“latitude”,” longitude”] ,'time')

I want to calculate records number for each user , at each different time, with each different location [“latitude”,” longitude”].

Upvotes: 0

Views: 131

Answers (2)

user7322570
user7322570

Reputation: 1

You can just use:

df.groupBy('id', 'latitude', 'longitude','time').agg(...)

This will work as expected without any additional steps.

Upvotes: 0

abaghel
abaghel

Reputation: 15297

You can combine "latitude" and "longitude" columns and then can use groupBy. Below sample is using Scala.

val df = Seq(("1","33.33","35.35","8:00"),("2","31.33","39.35","9:00"),("1","33.33","35.35","8:00")).toDF("id","latitude","longitude","time")
df.show()
val df1 = df.withColumn("lat-long",array($"latitude",$"longitude"))
df1.show()
val df2 = df1.groupBy("id","lat-long","time").count()
df2.show()

Output will be like below.

+---+--------------+----+-----+
| id|      lat-long|time|count|
+---+--------------+----+-----+
|  2|[31.33, 39.35]|9:00|    1|
|  1|[33.33, 35.35]|8:00|    2|
+---+--------------+----+-----+

Upvotes: 1

Related Questions