Reputation: 883

join of large tables with primary key

The table mydat has around 48.3M records with definition:

┌────────────────┬──────────────┬───────────┐
│     Column     │     Type     │ Modifiers │
├────────────────┼──────────────┼───────────┤
│ id             │ bigint       │ not null  │
│ dt             │ integer      │ not null  │
│ data           │ real         │           │
└────────────────┴──────────────┴───────────┘
Indexes:
    "mydat_pkey" PRIMARY KEY, btree (id, dt)

For each object as identified by id, there are about 40 records as denoted by the dt time field. The goal is to check the variation pattern between successive records, and the implementation is to join each record with its next one based on dt for each id. The query is as follows:

SELECT *
  FROM mydat AS dat1
  JOIN mydat AS dat2
    ON dat1.id = dat2.id
   AND dat1.dt = dat2.dt - 1;

The query plan is given below. Merge-Join is used, and it took forever to run. Also, we can see that the row count 427198811 is seriously over-estimated. Seems postgresql does not take the uniqueness of (id,dt) into consideration.

┌───────────────────────────────────────────────────────────────────────────────────────────────┐
│                                          QUERY PLAN                                           │
├───────────────────────────────────────────────────────────────────────────────────────────────┤
│ Merge Join  (cost=19919125.46..25466155.03 rows=247144681 width=222)                          │
│   Merge Cond: ((dat1.id = dat2.id) AND (dat1.dt = ((dat2.dt - 1))))                           │
│   ->  Sort  (cost=9959562.73..10080389.92 rows=48330876 width=111)                            │
│         Sort Key: dat1.id, dat1.dt                                                            │
│         ->  Seq Scan on mydat dat1  (cost=0.00..982694.76 rows=48330876 width=111)            │
│   ->  Materialize  (cost=9959562.73..10201217.11 rows=48330876 width=111)                     │
│         ->  Sort  (cost=9959562.73..10080389.92 rows=48330876 width=111)                      │
│               Sort Key: dat2.id, ((dat2.dt - 1))                                              │
│               ->  Seq Scan on mydat dat2  (cost=0.00..982694.76 rows=48330876 width=111)      │
└───────────────────────────────────────────────────────────────────────────────────────────────┘

Out of curiosity, here is a naive join of mydat with itself:

SELECT *
  FROM mydat AS dat1
  JOIN mydat AS dat2
    ON dat1.id = dat2.id
   AND dat1.dt = dat2.dt;

The query plan is similar:

┌───────────────────────────────────────────────────────────────────────────────────────────────┐
│                                          QUERY PLAN                                           │
├───────────────────────────────────────────────────────────────────────────────────────────────┤
│ Merge Join  (cost=19919125.46..27878413.41 rows=427198811 width=222)                          │
│   Merge Cond: ((dat1.id = dat2.id) AND (dat1.dt = dat2.dt))                                   │
│   ->  Sort  (cost=9959562.73..10080389.92 rows=48330876 width=111)                            │
│         Sort Key: dat1.id, dat1.dt                                                            │
│         ->  Seq Scan on act_2003q1 dat1  (cost=0.00..982694.76 rows=48330876 width=111)       │
│   ->  Materialize  (cost=9959562.73..10201217.11 rows=48330876 width=111)                     │
│         ->  Sort  (cost=9959562.73..10080389.92 rows=48330876 width=111)                      │
│               Sort Key: dat2.id, dat2.dt                                                      │
│               ->  Seq Scan on act_2003q1 dat2  (cost=0.00..982694.76 rows=48330876 width=111) │
└───────────────────────────────────────────────────────────────────────────────────────────────┘

Such a query plan puzzled me. Here, my question is what is the best practice for these use-cases? Thanks.

The above queries are tested on both Postgresql 9.5.3 on Windows and 9.4.6 on Linux: the results are similar.

Given @Erwin's suggestion, the lag over a window function is tested, and the result is much better than the initial merge-join approach: 511524ms to finish the query. As Erwin pointed out, the query is not exactly the same as the original one. Especially if there are gaps in the dt field, then some records would be unwanted.

This is an example I found that table partition is beneficial since the dataset I work with is larger than the example given above. The bottom-line of the problem is that postgresql uses disk to sort all records, and does not use index at all for both queries.

Upvotes: 1

join of large tables with primary key

Answers (2)

Related Questions