SQL: Match timestamps with time-only parameter to group and count unique times across multiple days

Question

Using SQL or Pyspark, I want to count the number of unique times in a timestamp across a time frame of 2 months. I want to see the distribution of how often rows are logged to the table. This is because I know there a large proportion of timestamps with the time of 00:00:00, but I want to know how big and the ratio compared to other times.

This query groups and counts the most common datetimes, but I need to exclude the date and only have the time. Apparently, this is not so common thing to do.

select timestamp,
    count(*) as count
from table_name
where timestamp between '2021-01-01' and '2021-02-28'
group by 1
order by 2 desc

The SQL/Pyspark is ran on a Spark DB in a Zeppelin Notebook.

Timestamps look like this: 2021-01-01 02:07:55

James · Accepted Answer

Maybe something like this?

select 
  date_format(timestamp, "H m s") as dataTime,
  count(*) as count
from table_name
where timestamp between '2021-01-01' and '2021-02-28'
group by date_format(timestamp, "H m s") 
order by 2 desc

Not a good idea name fields with reserved words (timestamp).

From spark documentation.

SQL: Match timestamps with time-only parameter to group and count unique times across multiple days

Answers (2)

Related Questions