Alexander Buczynsky
Alexander Buczynsky

Reputation: 51

TimeScaleDB materialized row is too big

Summary

I am stuck on a problem where the row limit on a Materialized View is exceeding the maximum allowed in a Postgres Database.

Description

The table named PAC4200_Metering has 108 fields that are all double precision fields. I would like to store a materialized cache of the tables where I store a JSON object with keys for average, max, min, etc of the field during the time bucket.

SELECT
  "deviceId",
  time_bucket('1 hours', "time") as starttime,
  json_build_object(
   'average', avg("voltage_an"),
   'maxvalue', max("voltage_an"),
   'minvalue', min("voltage_an"),
   'sum', sum("voltage_an"),
   'firstvalue', first("voltage_an", "time"),
   'firsttime', min("time" AT TIME ZONE 'UTC'),
   'lasttime', max("time" AT TIME ZONE 'UTC'),
   'lastvalue', last("voltage_an", "time"),
   'sd', stddev_pop("voltage_an") ,
   'countgood', COUNT(*),
   'countbad', 0,
   'countuncertain', 0
  ) AS "voltage_an"
  ...
FROM
  "PAC4200_Metering"
GROUP BY
  "deviceId",
  starttime

Error Response:

INFO:  new materialization range for public.PAC4200_Metering larger than allowed in one run, truncating (time column time) (1568760300000000)
INFO:  new materialization range for public.PAC4200_Metering (time column time) (1568708100000000)
INFO:  materializing continuous aggregate public.PAC4200_Metering_15_minute: new range up to 1568708100000000

ERROR:  row is too big: size 12456, maximum size 8160
CONTEXT:  SQL statement "INSERT INTO _timescaledb_internal._materialized_hypertable_1108 SELECT * FROM _timescaledb_internal._partial_view_1108 AS I WHERE I.starttime >= '-infinity' AND I.starttime < '2019-09-17 08:15:00+00';"
SQL state: 54000

Alternative Methods I have Tried

Instead of populating as JSON objects, I have tried storing each key in the JSON object (average, min, max, etc...) as fields that look like this "voltage_an_avg", "voltage_an_min", "voltage_an_max" and then accessing them in this way, but I am still running into the same limitation.

Main Question / Solutions

Upvotes: 3

Views: 1052

Answers (1)

k_rus
k_rus

Reputation: 3219

You cannot change the row limit as each row must fit a page, which is 8K.

Since the values are double precision fields, putting them into JSON might lose numeric precision. This is a limitation of JSON format in general. Thus you need to drop usage of JSON in the continuous aggregate or reconsider types, so they fit NUMERIC field of JSON (single precision).

Since the row is limited to 8160 bytes, around 1000 fields can fit. This can be enough to fit all your columns if you avoid materialising aggregates, which can be calculated later. For example, average can be calculated from sum and countgood. While countbad and countuncertain contains no information in your example.

If the OP query is used for creating continuous aggregate, it can be rewritten into:

SELECT
  "deviceId",
  time_bucket('1 hours', "time") as starttime,
  max("voltage_an") as maxvalue,
  min("voltage_an") as minvalue,
  sum("voltage_an") as sum,
  first("voltage_an", "time") as firstvalue,
  min("time" AT TIME ZONE 'UTC') as firsttime,
  max("time" AT TIME ZONE 'UTC') as lasttime,
  last("voltage_an", "time") as lastvalue,
  stddev_pop("voltage_an") as sd,
  COUNT(*) countgood,
  ...
FROM
  "PAC4200_Metering"
GROUP BY
  "deviceId",
  starttime

If voltage_an is one of 108 different values and 8 aggregates are calculated above, it will be 108*8*8 + 3*8 = 6912 + 24 = 6936 bytes max.

Then you can get the same result as original query from the continuous aggregate:

SELECT
  "deviceId",
  starttime,
  json_build_object(
   'average', "sum"/"countgood",
   'maxvalue', "maxvalue",
   'minvalue', "minvalue",
   'sum', "sum",
   'firstvalue', "firstvalue",
   'firsttime', "firsttime",
   'lasttime', "lasttime",
   'lastvalue', "lastvalue",
   'sd', "sd",
   'countgood', "countgood",
   'countbad', 0,
   'countuncertain', 0
  ) AS "voltage_an"
  ...
FROM
  CAGG

It is also possible to define several continuous aggregates and then join them.

I would recommend to think carefully how much information needs to be materialised as nothing comes for free. E.g., it occupies space. Furthermore, having each row occupying entire 8K page can affect efficiency of PostgreSQL further.

Upvotes: 1

Related Questions