NOT IN Subquery slow and runs out of memory (Clickhouse)

Question

I have a single table holding DNA variants for different people. I want to show the variants that are unique to a person:

Table DNA (engine ordered by variant):

   person | variant
   p1     | v1 
   p1     | v2 
   p1     | v3 
   p2     | v2 
   p2     | v3 
   p3     | v2 
   p3     | v3
   p4     | v2 
   p4     | v3

So a simple query:

   select variant from DNA where person = 'p1' and variant 
     not in (select variant from DNA where person in ('p2', 'p3'))

will return all variants unique to p1 vs. p2 and p3 (p4 not considered for this query). However - it is slow and runs out of memory.

Should I be doing this a different way?

NOT IN Subquery slow and runs out of memory (Clickhouse)

Answers (1)

Related Questions