venuktan
venuktan

Reputation: 1659

hbase openTSDB auto delete data after a certain time

I am using openTSDB to store time series data with hbase as the storage system.

I was wondering if there is a way to reduce the resolution of the data after a certain time?

What I mean by reducing the resolution of the data is, say originally, we have data coming in at a time resolution of 1/sec. after about 6 months it does not make sense to store the data at the same resolution. I would like to reduce the resolution to 1/min, i.e. delete the 59 other data points in that minute.

Is there a package on hbase or on openTSDB to do this ?

Thank you for the help.

Upvotes: 4

Views: 1969

Answers (2)

Nabeel Ahmed
Nabeel Ahmed

Reputation: 19272

For deleting data after a certain period HBase has setting for tables i.e. TTL - time to live.

ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. This applies to all versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.


As you're using OpenTSDB on top of HBase, it makes it pretty simple - as it creates 4 tables tsdb, tsdb-meta, tsdb-uid, tsdb-tree among them tsdb is the single huge table where OpenTSDB puts the whole data. So to set delete time we need to alter conf for tsdb table only.

As per the excerpt from the docs (above) TTL can be set for column family - tsdb has a single cf i.e. t, which is to fulfill the bare minimum i.e. HBase requires a table to have at-least one column family.

You can check the current value for the TTL, via shell:

hbase> describe 'tsdb'

Table tsdb is ENABLED
tsdb, {NAME => 't', VERSIONS => 1, COMPRESSION => 'NONE', TTL => 'FOREVER'}

using HBase shell - setting TTL:

hbase> alter ‘tsdb′, NAME => ‘t′, TTL => 8640000

8640000 number of seconds equal to 100 days (3 months approx)

Upvotes: 3

Vitor
Vitor

Reputation: 2792

There are no automated tools to do that in OpenTSDB. It may be possible to write one using its HTTP API, but you'd have to retrieve downsampled data, ask it to remove all points from that interval, and then insert the downsampled data again.

Now, this is something that probably will never be implemented in OpenTSDB, as one of its key features is storing data with full resolution forever. If you really need this feature, maybe another TSD, like Graphite would better suit your requirements?

Upvotes: 1

Related Questions