Reputation: 1048
I have db table which has around 5-6 Mn entries and it is taking around 20 minutes to perform vacuuming. Since, one field of this table is updated very frequently, thereare a lot of dead rows to deal with.
For an estimate, with our current user base it can have 2 Million dead tuples on daily basis. So, vacuuming of this table requires both:
What should be an ideal way to vacuum this table? Should I increase the autovacuum_cost_limit
to allow more operations per autovacuum run? But as i can see, it will increase IOPS
, which again might hinder the performance. Currently, I have autovacuum_scale_factor = 0.2
. Should I decrease it? If I decrease it it will run more often, although write IO will decrease but it will lead to more number of time period with high read IO.
Also, as the user base will increase it will take more and more time as the size of table with increase and vacuum will have to read a lot from disk. So, what should I do?
One of the solution I have thought of:
In general what is the approach that people takes, because I assume people must have very big table 10GB or more, that needs to be vacuumed.
Upvotes: 1
Views: 3068
Reputation: 44137
Separating the column is a viable strategy but would be a last resort to me. PostgreSQL already has a high per-row overhead, and doing this would double it (which might also remove most of the benefit). Plus, it would make your queries uglier, harder to read, harder to maintain, easier to introduce bugs. Where splitting it would be most attractive is if index-only-scans on a set of columns not including this is are important to you, and splitting it out lets you keep the visibility map for those remaining columns in a better state.
Why do you care that it takes 20 minutes? Is that causing something bad to happen? At that rate, you could vacuum this table 72 times a day, which seems to be way more often than it actually needs to be vacuumed. In v12, the default value for autovacuum_vacuum_cost_delay was dropped 10 fold, to 2ms. This change in default was not driven by changes in the code in v12, but rather by the realization that the old default was just out of date with modern hardware in most cases. I would have no trouble pushing that change into v11 config; but I don't think doing so would address your main concern, either.
Do you actually have a problem with the amount of IO you are generating, or is it just conjecture? The IO done is mostly sequential, but how important that is would depend on your storage hardware. Do you see latency spikes while the vacuum is happening? Are you charged per IO and your bill is too high? High IO is not inherently a problem, it is only a problem if it causes a problem.
Currently, I have autovacuum_scale_factor = 0.2. Should I decrease it? If I decrease it it will run more often, although write IO will decrease but it will lead to more number of time period with high read IO.
Running more often probably won't decrease your write IO by much if any. Every table/index page with at least one obsolete tuple needs to get written, during every vacuum. Writing one page just to remove one obsolete tuple will cause more writing than waiting until there are a lot of obsolete tuples that can all be removed by one write. You might be writing somewhat less per vacuum, but doing more vacuums will make up for that, and probably far more than make up for it.
Upvotes: 3
Reputation: 246328
There are two approaches:
Reduce autovacuum_vacuum_cost_delay
for that table so that autovacuum becomes faster. It will still consume I/O, CPU and RAM.
Set the fillfactor
for the table to a value less than 100 and make sure that the column you update frequently is not indexed. Then you could get HOT updates which don't require VACUUM
.
Upvotes: 1