Reputation: 51
I am stuck on a problem where the row limit on a Materialized View is exceeding the maximum allowed in a Postgres Database.
The table named PAC4200_Metering
has 108 fields that are all double precision fields. I would like to store a materialized cache of the tables where I store a JSON object with keys for average, max, min, etc of the field during the time bucket.
SELECT
"deviceId",
time_bucket('1 hours', "time") as starttime,
json_build_object(
'average', avg("voltage_an"),
'maxvalue', max("voltage_an"),
'minvalue', min("voltage_an"),
'sum', sum("voltage_an"),
'firstvalue', first("voltage_an", "time"),
'firsttime', min("time" AT TIME ZONE 'UTC'),
'lasttime', max("time" AT TIME ZONE 'UTC'),
'lastvalue', last("voltage_an", "time"),
'sd', stddev_pop("voltage_an") ,
'countgood', COUNT(*),
'countbad', 0,
'countuncertain', 0
) AS "voltage_an"
...
FROM
"PAC4200_Metering"
GROUP BY
"deviceId",
starttime
INFO: new materialization range for public.PAC4200_Metering larger than allowed in one run, truncating (time column time) (1568760300000000)
INFO: new materialization range for public.PAC4200_Metering (time column time) (1568708100000000)
INFO: materializing continuous aggregate public.PAC4200_Metering_15_minute: new range up to 1568708100000000
ERROR: row is too big: size 12456, maximum size 8160
CONTEXT: SQL statement "INSERT INTO _timescaledb_internal._materialized_hypertable_1108 SELECT * FROM _timescaledb_internal._partial_view_1108 AS I WHERE I.starttime >= '-infinity' AND I.starttime < '2019-09-17 08:15:00+00';"
SQL state: 54000
Instead of populating as JSON objects, I have tried storing each key in the JSON object (average, min, max, etc...) as fields that look like this "voltage_an_avg", "voltage_an_min", "voltage_an_max" and then accessing them in this way, but I am still running into the same limitation.
Upvotes: 3
Views: 1052
Reputation: 3219
You cannot change the row limit as each row must fit a page, which is 8K.
Since the values are double precision fields, putting them into JSON might lose numeric precision. This is a limitation of JSON format in general. Thus you need to drop usage of JSON in the continuous aggregate or reconsider types, so they fit NUMERIC field of JSON (single precision).
Since the row is limited to 8160 bytes, around 1000 fields can fit. This can be enough to fit all your columns if you avoid materialising aggregates, which can be calculated later. For example, average
can be calculated from sum
and countgood
. While countbad
and countuncertain
contains no information in your example.
If the OP query is used for creating continuous aggregate, it can be rewritten into:
SELECT
"deviceId",
time_bucket('1 hours', "time") as starttime,
max("voltage_an") as maxvalue,
min("voltage_an") as minvalue,
sum("voltage_an") as sum,
first("voltage_an", "time") as firstvalue,
min("time" AT TIME ZONE 'UTC') as firsttime,
max("time" AT TIME ZONE 'UTC') as lasttime,
last("voltage_an", "time") as lastvalue,
stddev_pop("voltage_an") as sd,
COUNT(*) countgood,
...
FROM
"PAC4200_Metering"
GROUP BY
"deviceId",
starttime
If voltage_an
is one of 108 different values and 8 aggregates are calculated above, it will be 108*8*8 + 3*8 = 6912 + 24 = 6936 bytes max.
Then you can get the same result as original query from the continuous aggregate:
SELECT
"deviceId",
starttime,
json_build_object(
'average', "sum"/"countgood",
'maxvalue', "maxvalue",
'minvalue', "minvalue",
'sum', "sum",
'firstvalue', "firstvalue",
'firsttime', "firsttime",
'lasttime', "lasttime",
'lastvalue', "lastvalue",
'sd', "sd",
'countgood', "countgood",
'countbad', 0,
'countuncertain', 0
) AS "voltage_an"
...
FROM
CAGG
It is also possible to define several continuous aggregates and then join them.
I would recommend to think carefully how much information needs to be materialised as nothing comes for free. E.g., it occupies space. Furthermore, having each row occupying entire 8K page can affect efficiency of PostgreSQL further.
Upvotes: 1