Same SQL request, CockroachDB takes 4min SQL Server takes 35ms. Am I missing something?

The database contains only 2 tables:

wallet (1 million rows)
transaction (15 million rows)

CockroachDB 19.2.6 runs on 3 Ubuntu machines

2vCPU each
8GB RAM each
Docker swarm container

SQL Server 2019 runs on 1 machine Windows Server 2019

4vCPU
16GB RAM

Here is the request

select * from transaction t 
join wallet s on t.sender_id=s.id
join wallet r on t.receiver_id=r.id
limit 10;

The SQL Server take only 35ms to return the first 10 results
CockroachDB take 3.5-5 min for it.

1) I know that the infrastructure is not fair enough for CockroachDB but though.. the different is really too big. Am I missing something? or CockroachDB is just very slow for this particular SQL request?

2) When I execute this request, the CPU of all 3 cockroach nodes went up to 100%. Is it normal?

Update: here is the request "EXPLAIN". I'm not sure how to read it..

> explain select * from transaction t
            -> join wallet s on t.sender_id=s.id
            -> join wallet r on t.receiver_id=r.id
            -> limit 10;
        tree         |       field        |     description
+---------------------+--------------------+----------------------+
                    | distributed        | true
                    | vectorized         | false
limit               |                    |
│                  | count              | 10
└── hash-join      |                    |
        │             | type               | inner
        │             | equality           | (receiver_id) = (id)
        │             | right cols are key |
        ├── hash-join |                    |
        │    │        | type               | inner
        │    │        | equality           | (sender_id) = (id)
        │    │        | right cols are key |
        │    ├── scan |                    |
        │    │        | table              | transaction@primary
        │    │        | spans              | ALL
        │    └── scan |                    |
        │             | table              | wallet@primary
        │             | spans              | ALL
        └── scan      |                    |
                    | table              | wallet@primary
                    | spans              | ALL

Upvotes: 3

Answers (2)

Nathan VanBenschoten

Reputation: 484

It appears that this is actually due to a difference in query plans between SQL Server and CockroachDB, but one that can be worked around in a few ways.

The root problem is that the transaction table has two foreign key constraints pointed at the wallet table, but both foreign keys are nullable. This prevents CockroachDB from pushing the 10 row limit through the join, because the scan on the transaction table may need to produce more than 10 rows for the entire query to produce up to 10 rows.

We see this in the query plan:

> explain select * from transaction t
  join wallet s on t.sender_id=s.id
  join wallet r on t.receiver_id=r.id
  limit 10;
                    info
---------------------------------------------
  distribution: full
  vectorized: true

  • limit
  │ count: 10
  │
  └── • lookup join
      │ table: wallet@primary
      │ equality: (receiver_id) = (id)
      │ equality cols are key
      │
      └── • lookup join
          │ table: wallet@primary
          │ equality: (sender_id) = (id)
          │ equality cols are key
          │
          └── • scan
                estimated row count: 10,000
                table: transaction@primary
                spans: FULL SCAN

Notice that the limit is applied after both joins.

There are two relatively straightforward ways to fix this. First, we could replace the joins with left joins. This will allow the limit to be pushed down to the scan on the transaction table because the left join will never discard rows.

> explain select * from transaction t
  left join wallet s on t.sender_id=s.id
  left join wallet r on t.receiver_id=r.id
  limit 10;
                  info
----------------------------------------
  distribution: full
  vectorized: true

  • lookup join (left outer)
  │ table: wallet@primary
  │ equality: (receiver_id) = (id)
  │ equality cols are key
  │
  └── • lookup join (left outer)
      │ table: wallet@primary
      │ equality: (sender_id) = (id)
      │ equality cols are key
      │
      └── • scan
            estimated row count: 10
            table: transaction@primary
            spans: LIMITED SCAN
            limit: 10

The other option is to make the referencing columns in the foreign key constraints non null. This will also allow the limit to be pushed down to the scan on the transaction table because even an inner join will never discard rows.

> alter table transaction alter column sender_id set not null;
ALTER TABLE

> alter table transaction alter column receiver_id set not null;
ALTER TABLE

> explain select * from transaction t
  join wallet s on t.sender_id=s.id
  join wallet r on t.receiver_id=r.id
  limit 10;
                  info
----------------------------------------
  distribution: full
  vectorized: true

  • lookup join
  │ table: wallet@primary
  │ equality: (receiver_id) = (id)
  │ equality cols are key
  │
  └── • lookup join
      │ table: wallet@primary
      │ equality: (sender_id) = (id)
      │ equality cols are key
      │
      └── • scan
            estimated row count: 10
            table: transaction@primary
            spans: LIMITED SCAN
            limit: 10

Upvotes: 3

rafiss

Reputation: 583

I believe it's possible your table has outdated statistics. (Though hopefully by this point, the statistics have automatically been updated and you aren't having this problem anymore.) You can read about this in the CockroachDB docs, and there is also a link there that describes how you can manually create new statistics.

Upvotes: 0

Same SQL request, CockroachDB takes 4min SQL Server takes 35ms. Am I missing something?

Answers (2)

Related Questions