pvpkiran
pvpkiran

Reputation: 27048

What are series and bucket in InfluxDb

While trying to understand different concepts of InfluxDb I came across this documentation, where there is a comparision of terms with SQL database.

An InfluxDB measurement is similar to an SQL database table.
InfluxDB tags are like indexed columns in an SQL database.
InfluxDB fields are like unindexed columns in an SQL database.
InfluxDB points are similar to SQL rows.

But there are couple of other terminology which I came across, which I could not clearly understand and wondering if there is an SQL equivalent for that.

Series
Bucket

From what I understand from the documentation

series is the collection of data that share a retention policy, measurement, and tag set.

Does this mean a series is a subset of data in a database table? Or is it like database views ?
I could not see any documentation explaining buckets. I guess this is a new concept in 2.0 release

Can someone please clarify these two concepts.

Upvotes: 19

Views: 23827

Answers (3)

yoonghm
yoonghm

Reputation: 4625

I have summarized my understanding below:

  • A bucket is named location with retention policy where time-series data is stored.
  • A series is a logical grouping of data defined by shared measurement, tag and field.
  • A measurement is similar to an SQL database table.
  • A tag is similar to indexed columns in an SQL database.
  • A field is similar to unindexed columns in an SQL database.
  • A point is similar to SQL row.

For example, a SQL table workdone:

Email Status time Completed
[email protected] start 1636775801000000000 76
[email protected] finish 1636775868000000000 120
[email protected] start 1636775801000000000 0
[email protected] finish 1636775868000000000 20
[email protected] start 1636775801000000000 54
[email protected] finish 1636775868000000000 56

The columns Email and Status are indexed.

Hence:

  • Measurement: workdone
  • Tags: Email, Status
  • Field: Completed
  • Series (Cardinality = 3 x 2 = 6):
    1. Measurement: workdone; Tags: Email: [email protected], Status: start; Field: Completed
    2. Measurement: workdone; Tags: Email: [email protected], Status: finish; Field: Completed
    3. Measurement: workdone; Tags: Email: [email protected], Status: start; Field: Completed
    4. Measurement: workdone; Tags: Email: [email protected], Status: finish; Field: Completed
    5. Measurement: workdone; Tags: Email: [email protected], Status: start; Field: Completed
    6. Measurement: workdone; Tags: Email: [email protected], Status: finish; Field: Completed

Splitting a logical series across multiple buckets may not improve performance but may complicate flux query as need to include multiple buckets.

Upvotes: 27

Benyamin Jafari
Benyamin Jafari

Reputation: 34046

According to the InfluxDB glossary:

Bucket

A bucket is a named location where time-series data is stored in InfluxDB 2.0. In InfluxDB 1.8+, each combination of a database and a retention policy (database/retention-policy) represents a bucket. Use the InfluxDB 2.0 API compatibility endpoints included with InfluxDB 1.8+ to interact with buckets.

Series

A logical grouping of data defined by shared measurement, tag set, and field key.

Upvotes: 5

Michael Cox
Michael Cox

Reputation: 1301

The InfluxDb document that you link to has an example of what a Series is, even if they don't label it as such. In InfluxDb, you can think of each combination of measurement and tags as being in it's own "table". The documentation splits it like this.

This table in SQL:

+---------+---------+---------------------+--------------+
| park_id | planet  | time                | #_foodships  |
+---------+---------+---------------------+--------------+
|       1 | Earth   | 1429185600000000000 |            0 |
|       2 | Saturn  | 1429185601000000000 |            3 |
+---------+---------+---------------------+--------------+

Becomes these two Series in InfluxDb:

name: foodships
tags: park_id=1, planet=Earth

----

name: foodships
tags: park_id=2, planet=Saturn

...etc...

This has implications when you query for the data, and is also the reason why the recommendation is that you don't have tag values with high cardinality. For example, if you had a tag of temperature (especially if it was a precise to multiple decimal points) that InfluxDb would be creating a "table" for each potential combination of tag values.

A Bucket is much easier to understand. It's just a combination of a database with a retention policy. In previous versions of InfluxDb these were separate concepts which have now been combined.

Upvotes: 5

Related Questions