ShwetaJ
ShwetaJ

Reputation: 519

How to resolve Amazon RDS Postgresql instance's DiskFull error?

We are having a very small Database for storing some relational data in an Amazon RDS instance. The version of the PostgreSQL Engine is 12.7.

There are a number of lambda functions in AWS in the same region, that access this instance for inserting records. In the process, some join queries are also used. We use psycopg2 Python library to interact with the DB. Since, the size of the data is very small, we have used a t2.small instance with 20GB storage and 1 CPU. In the production, however, a t2.medium instance has been used. Auto Scaling has not been enabled.

Recently, we have started experiencing an issue with this database. After the lambda functions run for a while, at some point, they time out. This is because the database takes too long to return a response, or, some times throws a Disk Full error as follows:

DiskFull
could not write to file "base/pgsql_tmp/pgsql_tmp1258.168": No space left on device

I have referred this documentation to identify the cause. Troubleshoot RDS DiskFull error

Following are the queries for checking the DB file size:

SELECT pg_size_pretty(pg_database_size('db_name'));

The response of this query is 35 MB.

SELECT pg_size_pretty(SUM(pg_relation_size(oid))) FROM pg_class;

The output of the above query is 33 MB. As we can see, the DB file size is very small. However, on checking the size of the temporary files, we see the following:

SELECT datname, temp_files AS "Temporary files",temp_bytes AS "Size of temporary files" FROM pg_stat_database;

Temporary Files Size

If we look at the size of the temporary files, its roughly 18.69 GB, which is why the DB is throwing a DiskFull error.

Why is the PostgreSQL instance not deleting the temporary files after the queries have finished? Even after rebooting the instance, the temporary file size is the same (although this is not a feasible solution as we want the DB to delete the temporary files on its own). Also, how do I avoid the DiskFull error as I may want to run more lambda functions that interact with the DB.

Just for additional information, I am including some RDS Monitoring graphs taken while the DB slowed down for CPU Utilisation and Free Storage Space:

Free Storage Space Graph enter image description here

From this, I am guessing that we probably need to enable autoscaling as the CPU Utilisation hits 83.5%. I would highly appreciate if someone shared some insights and helped in resolving the DiskFull error and identify why the temporary files are not deleted.

One of the join queries the lambda function runs on the database is:

SELECT DISTINCT
    scl1.*, scl2.date_to AS compiled_date_to
FROM
    logger_main_config_column_name_loading
JOIN 
    column_name_loading ON column_name_loading.id = logger_main_config_column_name_loading.column_name_loading_id
JOIN 
    sensor_config_column_name_loading ON sensor_config_column_name_loading.column_name_loading_id = column_name_loading.id
JOIN 
    sensor_config_loading AS scl1 ON scl1.id = sensor_config_column_name_loading.sensor_config_loading_id
INNER JOIN (
    SELECT id, hash, min(date_from) AS date_from, max(date_to) AS date_to
    FROM sensor_config_loading
    GROUP BY id, hash
) AS scl2
ON scl1.id = scl2.id AND scl1.hash=scl2.hash AND scl1.date_from=scl2.date_from
WHERE 
    logger_main_config_loading_id = %(logger_main_config_loading_id)s;

How can this query be optimized? Will running smaller queries in a loop be faster?

Upvotes: 2

Views: 3644

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 248305

pg_stat_database does not show the current size and number of temporary files, it shows cumulative historical data. So your database had 145 temporary files since the statistics were last reset.

Temporary files get deleted as soon as the query is done, no matter if it succeeds or fails.

You get the error because you have some rogue queries that write enough temporary files to fill the disk (perhaps some forgotten join conditions). To avoid the out-of-space condition, set the parameter temp_file_limit in postgresql.conf to a reasonable value and reload PostgreSQL.

Upvotes: 2

Related Questions