DoctorEvil
DoctorEvil

Reputation: 473

best way to store time series data with an identifier in sqlite3

Let's say there are a number of different sensors, all of which save data in a database as they measure it and each sensors can have more entries. I'm looking for the best way to save this data so that select queries could be done fastest possible later. Something like

"CREATE TABLE IF NOT EXISTS DataTable (sensor_id TEXT, measured_value REAL, time_of_measuring REAL)"

could basically work, but I imagine this wouldn't be very fast for selecting. I know about primary keys, but they prevent duplicates, so I can't just put sensor_id as a primary key. I'm basically looking for sqlite equivalent of saving data like this, but in a single table and as one measurement being one row:

data = {"sensor1":[x1,x2,x3], "sensor2":[z1,z2,z3]...}

I imagine something like ˇˇ would work for inserting more than a single value for each sensor, but would that help at all with selecting?

"CREATE TABLE IF NOT EXISTS DataTable (sensor_id TEXT NOT NULL, measured_value REAL, time_of_measuring REAL NOT NULL, PRIMARY KEY(sensor_id, time_of_measuring ))"

Upvotes: 1

Views: 2800

Answers (1)

GMB
GMB

Reputation: 222502

For this time-series data, the relevant primary (or unique) key is probably (time_of_measuring, sensor_id). This is close to what you suggested at the end of your question, but the columns are in reverse order.

Technically, this prevent a sensor from loging two measures at the same point in time, which seems like a relevant business rule for your data.

When it comes to the speed of queries: it highly depends on the query themselves. Say that you have query like:

select sensor_id, measured_val, time_of_measuring
from data_table
where 
    sensor_id = ? 
    and time_of_measuring >= ?
    and time_of_measuring <  ?
order by sensor_id, time_of_measuring
    

This query would take advantage of the primary key index, since the columns are the same as those of the where and order by clauses. You could add the measured_val to the index to make the query even more efficient:

create index data_table_idx1 
    on data_table(sensor_id, time_of_measuring, measured_val);
    

As another example, consider this where clause:

where time_of_measuring >= ? and time_of_measuring <  ?

No predicate on sensor_id, but time_of_measuring is the first column in the index, so the primary key index can be used.

As typical counter-examples, the following where clauses would not benefit the index:

where sensor_id = ?                         -- need an index where `sensor_id` is first
where sensor_id = ? and measured_val >= ?   -- needs an index on "(sensor_id, measured_val)"

Upvotes: 1

Related Questions