Brian
Brian

Reputation: 7155

Postgres query takes longer than expected, even w/ index on fields

I'm working on optimizing a Postgres table that stores information from a log file.

Here is the query:

SELECT c_ip as ip
     , x_ctx as file_name
     , date_time
     , live
     , c_user_agent as user_agent 
FROM events 
WHERE x_event = 'play' 
  AND date = '2012-12-01' 
  AND username = 'testing'

There are b-tree indexes on x_event, date, and username. In this table, there are around 25 million rows. Right now the query takes about 20-25 (correction, more like 40) seconds, and returns 143,000 rows.

Is that time expected? I would have thought it would be faster because of the indexes. Perhaps because of the sheer amount of data it has to go thru?

EDIT: Here is the EXPLAIN ANALYZE:

Bitmap Heap Scan on events  (cost=251347.32..373829.74 rows=35190 width=56) (actual time=5768.409..6124.313 rows=143061 loops=1)
  Recheck Cond: ((date = '2012-12-01'::date) AND (username = 'testing'::text) AND (x_event = 'play'::text))
  ->  BitmapAnd  (cost=251347.32..251347.32 rows=35190 width=0) (actual time=5762.083..5762.083 rows=0 loops=1)
        ->  Bitmap Index Scan on index_events_fresh_date  (cost=0.00..10247.04 rows=554137 width=0) (actual time=57.568..57.568 rows=572221 loops=1)
              Index Cond: (date = '2012-12-01'::date)
        ->  Bitmap Index Scan on index_events_fresh_username  (cost=0.00..116960.55 rows=6328206 width=0) (actual time=3184.053..3184.053 rows=6245831 loops=1)
              Index Cond: (username = 'testing'::text)
        ->  Bitmap Index Scan on index_events_fresh_x_event  (cost=0.00..124112.84 rows=6328206 width=0) (actual time=2478.919..2478.919 rows=6245841 loops=1)
              Index Cond: (x_event = 'play'::text)
Total runtime: 6148.313 ms

I have several questions about that:

  1. Am I correct that there are 554137 rows in the date index? There are less than 50 date's that should be in there.
  2. How do I know what index it is using of the three listed?
  3. The total runtime listed was around 6 seconds, but when I run the query w/o EXPLAIN ANALYZE, it takes around 40 seconds.

Upvotes: 3

Views: 1636

Answers (2)

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125464

If 5.7 seconds is not good enough you can try a multi column index:

create index index_name on events(user_name, date, x_event)

I placed user_name first as I guess it is the column with the highest cardinality.

Upvotes: 1

Eelke
Eelke

Reputation: 22043

First as Scott Marlowe says the query only takes 6s to run the rest is transfer time. It seems slower without explain analyze because the result is much larger then the ten lines of the explain analyze output and thus takes longer to transfer. If you would turn on logging of queries and you ran this query you would probably find in the log that the query without explain analyze runs even faster (explain analyze slows things down). BTW pgadmin is quite slow itself if that is what you are using.

As for the number of rows in the date index pg is right. Even if you only have 50 distinct values all rows will be in the index. Ofcourse the btree part itself will only contain the 50 distinct values but under each leaf value it will have a list of all rows for that value. There is of course the special case of an index with a where clause which would only contain the rows matching the where clause but I do not expect you are using that right?

It is using all indexes listed in the output of explain analyze. In this case it converts each index into a bitmap having bits sets for each row that matches the criteria for that index scan. These three bitmaps can then very quickly be combined to a bitmap containing the result of the combined criteria.

Upvotes: 1

Related Questions