user9489451
user9489451

Reputation: 35

DB folder utilising lot of space creating space issue

I have a grafana windows server.Where we have integrated HyperV snaphot related infor as well as CPU, Memory usage of HV's etc. I could see below folder in our grana windows server

C:\InfluxDB\data\telegraf\autogen

Under this autogen folder, I can see multiple subfolder with .tsm files. Each file create every 7 days and the folder size is around 4 to 5GB. There are many files in this autogen folder from 2nd Feb 2017 to 14 Mar 2018 which is utilizing around 225GB space.

Upvotes: 3

Views: 4170

Answers (1)

WindyFields
WindyFields

Reputation: 2875

What you see: autogen is a default Retention Policy (RP) auto-created by InfluxDB and has an infinite data retention duration. All datapoints in Influx are logically stored in shards. Physically shards data is compressed and stored in .tsm files. Shards are unified into shards groups. Each shard group covers a specific time range defined by so-called shard duration and stores datapoints belonging to this time interval. By default for RP with retention duration > 6 month shard group duration is set to 7 days.

For more info see docs on storage engine.

Regarding your questions:

  • "Is there anyway we can shrink the size of autogen file?"
    Probably no. The only thing you can do is to rely on InfluxDB internal compression. Here they say that it may be improved if you increase shard duration.
    *Although, because InfluxDB drop the whole shard rather then separate datapoints, the increase of shard duration will make your data to be stored until the whole shard goes out of scope of current retention duration and only then it will be dropped. Though, if you have an infinite retention duration it doesn't matter. This leads us to the second question.
  • "Is it possible to delete the old file under autogen folder?"
    If you can afford loosing old data or can't afford to much storage space InfluxDB lets to specify data Retention Policy (RP), already mentioned above. Basically, all your measurements are associated with a specific RP and the data will be deleted as soon as retention duration comes to the end. So if you specify a RP of 1 year, InfluxDB will automatically delete all datapoints older then now() - 1 year. RP is a standard (and pretty obvious) way of dealing with storage issues. A logical continuation of RP idea is to group and aggregate your data over time over longer discrete time intervals (downsampling). In Influx it can be achieved with Continuous Queries (CQ). You can read more of data retention and downsamping here.

In conclusion, storage limitation are inevitable and properly configured retention policies is the way to go.

Upvotes: 6

Related Questions