gidgicken
gidgicken

Reputation: 21

Beginner PostgreSQL: Setting up environment for querying large pre-existing database

I'm brand spanking new to relational databases, and need help setting up a basic working environment for querying a large (pre-existing) database. I've connected to our remote server through PGAdmin, but all my basic queries are extremely slow.

Query

SELECT to_char(created, 'YYYY-MM-DD'), user_id 
FROM transactions
WHERE type = 'purchase'
AND created > NOW() AT TIME ZONE 'US/Mountain' - INTERVAL '1 month'
ORDER BY created;

EXPLAIN(BUFFERS, ANALYZE) output:

Index Scan using payments_transaction_created_42e34d6ca1e04ffe_uniq
      on     payments_transaction  (cost=0.44..339376.18 rows=481811 width=24) (actual time=2.643..49422.733 rows=511058 loops=1)
Index Cond: (created > (timezone('US/Mountain'::text, now()) - '1 mon'::interval))
Filter: ((type)::text = 'purchase'::text)
Rows Removed by Filter: 955691
Buffers: shared hit=405597 read=295625 written=764
Planning time: 0.111 ms
Execution time: 49569.324 ms

In my limited knowledge, the execution time seems much too long to me.

What steps should I take to create the most efficient environment possible? Does creating a local copy of the database mean faster queries? Are there other factors that could lead to such inefficiencies?

Remember, I'm brand new to databases, so there are no answers too simple.

Upvotes: 1

Views: 43

Answers (1)

Kamil Gosciminski
Kamil Gosciminski

Reputation: 17177

Your query seems to be using an index on transactions(created). You are returning around 0.5M of rows while discarding 2x as much.

Depending on the distribution of your values across type column you may benefit from adding an index on both your text and timestamp columns:

CREATE INDEX ON transactions(type, created);

A rule of thumb while adding indexes is to first index for equality operators and then for dates. It might actually speed up your query a lot - though as I've mentioned earlier depending on value distribution.

Remember to update table statistics after creating an index using:

ANALYZE transactions;

Testing on local copy of the database does mean faster processing because you are not sending packages over the net. Instead everything is being processed locally, but this shouldn't be that much of a deal with your query and it's better to always test in as much comparable environment to the production as you can.

Upvotes: 1

Related Questions