user6137566
user6137566

Reputation: 71

Large amount of data query

I have a partition table, now about 200 multiple partitions, each partition table inside about 12 million data. Now select very slow, table index of the corresponding field has been established, but is still very slow, I see execution plan, found a large number of from the disk to read data, to me, I changed how to adjust and optimize it

   gjdd4=# \d t_bus_position_20160306_20160308
               Table "public.t_bus_position_20160306_20160308"
         Column         |              Type              |     Modifiers      
------------------------+--------------------------------+--------------------
 pos_uuid               | character varying(20)          | collate zh_CN.utf8
 pos_line_uuid          | character varying(20)          | 
 pos_line_type          | character varying(20)          | 
 pos_bus_uuid           | character varying(20)          | collate zh_CN.utf8
 pos_dev_uuid           | character varying(20)          | 
 pos_sta_uuid           | character varying(20)          | 
 pos_drv_ic_card        | character varying(30)          | 
 pos_lng                | character varying(30)          | 
 pos_lat                | character varying(30)          | 
 pos_bus_speed          | character varying(20)          | 
 pos_real_time_status   | character varying(20)          | 
 pos_gather_time        | timestamp(6) without time zone | 
 pos_storage_time       | timestamp(6) without time zone | 
 pos_is_offset          | boolean                        | 
 pos_is_overspeed       | character varying(1)           | 
 pos_cursor_over_ground | character varying(20)          | 
 pos_all_alarms         | character varying(30)          | 
 pos_is_in_station      | character varying(1)           | 
 pos_closed_alarms      | character varying(30)          | 
 pos_dis_to_pre_i       | integer                        | 
 pos_odometer_i         | bigint                         | 
 pos_relative_location  | real                           | 
 pos_dis_to_pre         | real                           | 
 pos_odometer           | double precision               | 
 pos_gather_time1       | bigint                         | 
Indexes:
    "idx_multi" btree (pos_bus_uuid, pos_gather_time DESC)
    "idx_trgm" btree (replace(to_char(pos_gather_time, 'YYYYMMDDHH24'::text), ' '::text, ''::text))
    "idx_trgm1" btree (to_char(pos_gather_time, 'YYYYMMDD'::text))
    "tp_20160306_20160308_pos_dev_uuid_idx" btree (pos_dev_uuid)
Check constraints:
    "t_bus_position_20160306_20160308_pos_gather_time_check" CHECK (pos_gather_time >= '2016-03-06 00:00:00'::timestamp without time zone AND
 pos_gather_time < '2016-03-09 00:00:00'::timestamp without time zone)

The plan is like this.

    gjdd4=# explain(costs,buffers,timing,analyze) select pos_bus_uuid from    test2 group by pos_bus_uuid;
     HashAggregate  (cost=802989.75..802993.00 rows=325 width=21) (actual time=42721.528..42721.679 rows=354 loops=1)
       Group Key: pos_bus_uuid
       Buffers: shared hit=3560 read=567491
       I/O Timings: read=20231.511
       ->  Seq Scan on test2  (cost=0.00..756602.00 rows=18555100 width=21) (actual time=0.067..27749.533 rows=18555100 loops=1)
             Buffers: shared hit=3560 read=567491
             I/O Timings: read=20231.511
     Planning time: 0.116 ms
     Execution time: 42721.839 ms
    (9 rows)

    Time: 42722.629 ms

Upvotes: 1

Views: 66

Answers (2)

trucks are heavy
trucks are heavy

Reputation: 76

Markus Winand's answer is correct - you need to copy the full sql, syntax error is due to your not including last line - 'SELECT pos_bus_uuid FROM t WHERE pos_bus_uuid IS NOT NULL;'

Would have added this as a comment, but reputation too low to comment.

Upvotes: 1

Markus Winand
Markus Winand

Reputation: 8746

Your query does not do any real aggregation but merely distinct. If this is what you really want (all distinct pos_bus_uuid values) than you can use a technique called loose index scan:

Here the tailored query assuming pos_bus_uuid has a not null constraint:

WITH RECURSIVE t AS (
    (SELECT pos_bus_uuid FROM test2 ORDER BY pos_bus_uuid LIMIT 1)  -- parentheses required
  UNION ALL
     SELECT (SELECT pos_bus_uuid FROM test2
              WHERE pos_bus_uuid > t.pos_bus_uuid ORDER BY pos_bus_uuid LIMIT 1)
       FROM t
      WHERE t.pos_bus_uuid IS NOT NULL
   )
SELECT pos_bus_uuid FROM t WHERE pos_bus_uuid IS NOT NULL;

Your index pos_bus_uuid should be good enough for this query.

Upvotes: 1

Related Questions