SQL: get only sampled data from large dataset

Question

So I get a large amount of data from server using this SQL:

SELECT value,DATE_FORMAT(`time`,'%Y-%m-%dT%H:%i:%sZ') AS `time` 
  FROM history WHERE :id=reference AND 
  (time BETWEEN :start AND :end) ORDER BY time LIMIT 100 ";

Limit is set to fixed 100 entries.

But in given time range there could be 5 000 entries.

Here's my goal: I want to sample these entries by time between each entry. So for example this interval between each entry will be 60 seconds (let's say it is parameter), then I will receive 100 entries (from 5000), but there will be always one minute difference between each one of them.

E.g.

value1,14:40:40
value2,14:41:40
...
value100,16:20:40

Is this doable via SQL? Or do I have to parse through this large data with PHP?

If it is not doable just with SQL, is it possible to get this 100 entries equally spread across this 5000 entries? (so not by time, but I'd get fixed entry id1,id50,id100,id150,...,id5000). Again just with sql.

Thanks!

Thorsten Kettner · Accepted Answer

Just as Kristof sais in his answer: Order the rows and take each nth row by applying a row number. This is how it is done in MySQL:

select 
  rows.value,
  date_format(rows.`time`,'%Y-%m-%dT%H:%i:%sZ') AS `time` 
from
(
  select 
    @row_number := @row_number + 1 as row_number,
    history.*
  from history 
  cross join (select @row_number := 0) as t
  where reference = :id and `time` between :start and :end
  order by `time`
) as rows
cross join 
(
  select count(*) as cnt
  from history 
  where reference = :id and `time` between :start and :end
) as rowcount
where mod(rows.row_number - 1, ceil(rowcount.cnt / 100)) = 0;

And this is how the same would look in another dbms, Oracle for instance, using analytic functions:

select 
  rows.value,
  to_char(rows."time",'yyyy-mm-dd hh24:mi:ss') AS "time" 
from
(  
  select 
    row_number() over (order by "time") as rown,
    count(*) over () as cnt,
    history.*
  from history
  where reference = :id and "time" between :start and :end
) rows
where mod(rows.rown - 1, ceil(rows.cnt / 100)) = 0;

These queries result in 100 records or a little less, depending on how many rows the table contains exactly. You can also use TRUNCATE(rowcount.cnt,0) instead of CEIL(rowcount.cnt) in MySQL, thus getting hundred rows or a little more and additionally apply LIMIT 100 to get exactly 100 rows (provided there are at least 100 rows in the table).

SQL: get only sampled data from large dataset

Answers (2)

Related Questions