Reputation: 19
We are using Amazon RDS to host our PostgreSQL databases. Our production instance (db.t3.xlarge, Single-AZ) was running smoothly until suddenly Read IOPS
, Read Latency
, Read Throughput
and Disk Queue Depth
metrics in the AWS console increased rapidly and stayed high afterward (with a lower variability) whereas Write IOPS
and Write Throughput
were normal.
There were no code changes or deployments on the date of the increase. There were no significant increases in user activity either.
About our DB structure, we have a single table that holds all of our data and in that table, we have these fields: id
as UUID (primary key), type
as VARCHAR, data
as JSONB (holds the actual data), createdAt
and updatedAt
as timestamp with the time zone. Most of our data columns have sizes bigger than 2 KB so most of the rows are stored in TOAST table. We have 20 (BTREE) indexes that are created for frequently used fields in JSONB.
So far we have tried VACUUM ANALYZE
and also completely rebuilding our table: creating a new table, copying all data from the old table, creating all indexes. They didn't change the behavior.
We also tried increasing storage thus increasing IOPS performance. It helped a bit but it is still not the same as before.
What could be the root cause of this problem? How can we fix it permanently (without increasing storage or instance type)? For now, we are looking for easy changes and we will improve our data model in the future.
Upvotes: 1
Views: 2204
Reputation: 141
T3 instances are not suitable for production. Try moving to another family like a C or M type. You may have hit some burst limits that are now causing odd behaviour
Upvotes: 1