Reputation: 2457
I’m using Spark2.0 I have a dataframe having several columns like id, latitude, longitude, time, I want to do a groupby and keep [“latitude”,” longitude”] always together,
Could I do the following?
df.groupBy('id',[“latitude”,” longitude”] ,'time')
I want to calculate records number for each user , at each different time, with each different location [“latitude”,” longitude”].
Upvotes: 0
Views: 131
Reputation: 1
You can just use:
df.groupBy('id', 'latitude', 'longitude','time').agg(...)
This will work as expected without any additional steps.
Upvotes: 0
Reputation: 15297
You can combine "latitude" and "longitude" columns and then can use groupBy
. Below sample is using Scala.
val df = Seq(("1","33.33","35.35","8:00"),("2","31.33","39.35","9:00"),("1","33.33","35.35","8:00")).toDF("id","latitude","longitude","time")
df.show()
val df1 = df.withColumn("lat-long",array($"latitude",$"longitude"))
df1.show()
val df2 = df1.groupBy("id","lat-long","time").count()
df2.show()
Output will be like below.
+---+--------------+----+-----+
| id| lat-long|time|count|
+---+--------------+----+-----+
| 2|[31.33, 39.35]|9:00| 1|
| 1|[33.33, 35.35]|8:00| 2|
+---+--------------+----+-----+
Upvotes: 1