emkay
emkay

Reputation: 169

Timeseries database with multi-column aggregation

I am trying to design a system that has time-series data. Here are the requirements of the system:

Here are the constraints I have:

Upvotes: -1

Views: 441

Answers (2)

Sergei Rodionov
Sergei Rodionov

Reputation: 4549

The aggregations you outlined are rather standard. It would be hard to find a database that wouldn't support them.

You could try Axibase TSD. It can efficiently calculate calendar aggregates (e.g. daily totals) with multiple dimensions in the user-defined timezone. This would be useful if the metrics you're collecting are driven by end-user activity:

SELECT date_format(time, "yyyy-MMM-dd", "US/Eastern"), 
  entity AS 'city', SUM(value)
FROM "email.active_sessions"
  WHERE datetime >= current_year
  AND entity.tags.state = 'PA'
GROUP BY entity, PERIOD(1 DAY, "US/Eastern")

The intervals can be conveniently specified with extended keywords/functions. Here's how '(last) week' condition looks like:

WHERE datetime >= previous_week AND datetime < current_week

Refer to SQL docs for additional examples.

ATSD runs on most Linux distributions and can be executed in both stand-along and distributed modes.

Disclosure: I work for Axibase.

Upvotes: 0

Evgeny Lazin
Evgeny Lazin

Reputation: 9423

You can try Akumuli (warning: I'm the author). Akumuli can perform aggregations pretty quickly (sub millisecond) because it's a column-oriented database and it precomputes some aggregates for you. And it can perform all the queries that you need, e.g. if your series looks like this:

${measurement_name} city=${city_name} dept=${department_name} state=${state_name}

And the actual data looks like this (e.g this is a power consumption data measured in watts):

W city=Jersey_City state=New-Jersey dept=1
W city=Jersey_City state=New-Jersey dept=2
W city=Paris state=Iowa dept=1
W city=Paris state=Iowa dept=2
W city=Texas state=Texas dept=1
W city=Texas state=Texas dept=3
W city=Paris state=Arkansas dept=1
W city=Paris state=Arkansas dept=2

You can aggregate all data with dept=1 and dept=2 tags using this query:

{
    "aggregate": { "W": "sum" },
    "range": { "from": "20170501T000000",
               "to": "20170502T000000" },
    "group-by": [ "dept" ],
    "where": { "dept": [1, 2] }
}

You will get the sum for each dept but only departments 1 and 2 will be included.

You can split by the city instead:

{
    "aggregate": { "W": "sum" },
    "range": { "from": "20170501T000000",
               "to": "20170502T000000" },
    "group-by": [ "city" ],
}

You will get sum for every city name (all departments in one city will be joined). Of cause, you can filter by tag using where clause.

If you have several cities with the same name (e.g. Paris in Iowa and Arkansas) you can group values by city and state:

{
    "aggregate": { "W": "sum" },
    "range": { "from": "20170501T000000",
               "to": "20170502T000000" },
    "group-by": [ "city", "state" ],
}

You will get one value for Paris Arkansas and another for Paris Iowa.

Upvotes: 1

Related Questions