Isaac
Isaac

Reputation: 2364

Mac w/PostgreSQL flush/empty cache for performance tuning

This question is going to be a bit specific because I have tried A LOT of things out there and none of it has worked for me. I'm hoping someone out there might have another idea.

I am working with PostgreSQL on a Mac (OS High Sierra) and I am trying to improve the performance for generating a materialized view, but can't compare my changes anymore because it seems PostgreSQL has cached the materialized view. It used to take ~12 minutes to generate the materialized view, and now it's taking less than 10 seconds (same code, I reverted the changes).

I used EXPLAIN (ANALYZE, BUFFERS) to confirm that almost all of the data getting fetched by the query to generate the materialized view is a hit (cached), and there were almost no disk reads.

I do not know if the information is cached in PostgreSQL's shared buffers or in the OS cache because at this point I've done things that I thought would have cleared both.

Here is what I have tried for emptying the PostgreSQL cache:

  1. Restarted PostgreSQL server using brew services stop postgres, and then brew services start postgres (also tried calling sync && sudo purge in between). I confirmed with top as well as grep that postgres was no longer running.
  2. Used DISCARD ALL, as well as DISCARD with its other options.
  3. Set the shared_buffers setting in postgresql.conf to the minimum (128k).
  4. Installed, compiled, and used pg_dropcache.
  5. I looked at pg_ctl for a bit but I'll admit I couldn't figure out how to use it. I got the error no database directory specified and environment variable PGDATA unset, and I am not sure what to set the -D/pgdata option to for my case.
  6. VACUUM. I know this shouldn't have had an effect, but I tried it anyway.

Here is what I have tried for emptying the operating system's cache:

  1. Restarted computer.
  2. Emptied ~/Library/Caches and /Library/Caches.
  3. sync && sudo purge as well as sync && purge.
  4. Booted up in Safe Mode.

I have also tried a few other things that I thought would force PostgreSQL to generate the materialized view from scratch (these would have been fine since I only need to test performance in dev for now):

  1. Cloned the main table used in the materialized view, and generated the materialized view from the clone. It still generated within 10 seconds.
  2. Scrambled some column values (first_name, last_name, mem_id (not the primary key)). It still generated within 10 seconds (and the materialized view was generated correctly with the newly scrambled values).

I am stuck and do not know what to try anymore. Any ideas/help would be appreciated!

Upvotes: 2

Views: 2614

Answers (1)

jjanes
jjanes

Reputation: 44363

Rebooting your computer clears both of the caches (unless you use something like autoprewarm from pg_prewarm, but that code has not be released yet). If the reboot doesn't cause the problem to reappear, then you have either fixed the problem permanently or didn't correctly understand it in the first place.

One possibility is that an ANALYZE (either manual, or auto) fixed some outdated statistics which was causing a poor plan to be used by the materialized view refresh. Another possibility is that a VACUUM means that now index-only scans no longer have to access the table pages, because they are marked as all-visible. If either of these is the case, and if you wanted to recreate the problem for some reason, you would have to restore the database to the state before VACUUM or ANALYZE was run.

EXPLAIN (ANALYZE, BUFFERS) only knows about shared_buffers. If something is a hit in the OS cache only, it will still be reported as a miss by EXPLAIN (ANALYZE, BUFFERS). If you freshly restarted PostgreSQL and the very first query run shows mostly buffer hits and only a few misses, that indicates your query is hitting the same buffers over and over again. This is common in index-only scans, for example, because for every row it consults one of just a handful of visibility map pages.

Upvotes: 1

Related Questions