Martlark
Martlark

Reputation: 14581

Databricks windows group by using sql

I'm having trouble finding how to convert a pyspark window statement into the equivalent SQL format.

Example:

  eventsDF \
    .groupBy(
      "deviceId",
      window("eventTime", "10 minutes", "5 minutes")) \
    .count()

It should be something like:

   select window, deviceId, count(deviceId)
   from events
   group by window eventTime 10 minutes 5 minutes, deviceId

Upvotes: 1

Views: 377

Answers (2)

mck
mck

Reputation: 42332

You need to fix some syntax errors. The window should be wrapped in parentheses.

select
    window(eventTime, '10 minutes', '5 minutes'),
    deviceId,
    count(deviceId)
from events
group by
    window(eventTime, '10 minutes', '5 minutes'),
    deviceId

Upvotes: 1

Amir Maleki
Amir Maleki

Reputation: 419

I reckon the result of the following query is what you expected:

select deviceId, count(deviceId)
from events
group by `eventTime`, `10 minutes`,  `5 minutes`, `deviceId`

the group by will group the rows on all the windows you expect

Upvotes: 0

Related Questions