Evgeny Semyonov
Evgeny Semyonov

Reputation: 131

Large inner join

We have 2 tables with English words: words_1 and words_2 with fields(word as VARCHAR, ref as INT), where word - it's an english word, ref - reference on another(third) table(it's not important).

In each table all words are unique. First table contains some words that are not in second one(and on the contrary second table contains some unique words).

But most words in two tables are same.

Need to get: Result table with all distinct words and ref's.

Initial conditions

  1. Ref's for same tables can be different( dictionaries were loaded from different places).
  2. Words count 300 000 in each table, so inner join is not convinient

Examples

words_1
________
Health-1
Car-3
Speed-5

words_2
_________
Health-2
Buty-6
Fast-8
Speed-9

Result table
_____________
Health-1
Car-3
Speed-5
Buty-6
Fast-8

Upvotes: 0

Views: 56

Answers (2)

Jaugar Chang
Jaugar Chang

Reputation: 3196

select word,min(ref)
from (
    select word,ref
    from words_1
    union all
    select word,ref
    from words_2
    ) t
group by word

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269503

Try using a full outer join:

select coalesce(w1.word, w2.word) as word, coalesce(w1.ref, w2.ref) as ref
from words_1 w1 full outer join
     words_2 w2
     on w1.word = w2.word;

The only time this will not work is if ref can be NULL in either table. In that case, change the on to:

on w1.word = w2.word and w1.ref is not null and w2.ref is not null

If you want to improve performance, just create an index on the tables:

create index idx_words1_word_ref on words_1(word, ref);
create index idx_words2_word_ref on words_2(word, ref);

A join is quite doable and even without the index, SQL Server should be smart enough to come up with a reasonable implementation.

Upvotes: 0

Related Questions