Ask613
Ask613

Reputation: 2795

PostgreSQL Select Join Not in List

The project is using Postgres 9.3

I have tables (that I have simplified) as follows:

t_person (30 million records)
- id
- first_name
- last_name
- gender

t_city (70,000 records)
- id
- name
- country_id

t_country (20 records)
- id
- name

t_last_city_visited (over 200 million records)
- person_id
- city_id
- country_id
  - There is a unique constraint on person_id, country_id to
    ensure that each person only has one last city per country

What I need to do are variations on the following:

Get the ids of Person who are female who have visited country 'UK' but have never visited country 'USA'

I have tried the following, but it is too slow.

select t_person.id from t_person
join t_last_city_visited
  on (
          t_last_city_visited.person_id = t_person.id
          and country_id = (select id from t_country where name = 'UK')
     )
where gender = 'female'
except
(
    select t_person.id from t_person
    join t_last_city_visited
      on (
             t_last_city_visited.person_id = t_person.id
             and country_id = (select id from t_country where name = 'USA')
         )
)

I would really appreciate any help.

Upvotes: 2

Views: 5903

Answers (3)

Nelson Teixeira
Nelson Teixeira

Reputation: 6572

This is the way I would approach it, you can later substitute the inner queries by a with alias as @zedfoxus said

select 
    id 
from 
    (SELECT
        p.id id
    FROM
        t_person p JOIN t_last_city_visited lcv
            ON(lcv.person_id = p.id)
        JOIN country c
            ON(lcv.country_id = c.id  and cname = 'UK')
    WHERE
        p.gender = 'female') v JOIN

    (SELECT
        p2.id id
    FROM
        t_person p2 JOIN t_last_city_visited lcv2
            ON(lcv2.person_id = p2.id)
        JOIN country c
            ON(lcv.country_id = c.id  and cname != 'USA')
    WHERE
        p.gender = 'female') nv

        ON(v.id = nv.id)

Upvotes: 0

David Aldridge
David Aldridge

Reputation: 52356

Hint: What you want to do here is to find the females for whom there EXISTS a visit to the UK, but where NOT EXISTS a visit to the US.

Something like:

select ...
from   t_person
where  ...
   and exists (select null
                 from t_last_city_visited join
                      t_country on (...)
                where t_country.name = 'UK')
   and not exists (select null
                 from t_last_city_visited join
                      t_country on (...)
                where t_country.name = 'US')

Another approach, to find the people who have visited the UK and not the US, which you can then join to the people to filter by gender:

select   person_id
  from   t_last_city_visited join
         t_country on t_last_city_visited.country_id = t_country.id
 where   t_country.name in ('US','UK')
group by person_id
having   max(t_country.name) = 'UK'

Upvotes: 6

zedfoxus
zedfoxus

Reputation: 37069

Could you please run analyze and execute this query?

-- females who visited UK
with uk_person as (
  select distinct person_id
  from t_last_city_visited t
  inner join t_person p on t.person_id = p.id and 'F' = p.gender
  where country_id  = (select id from t_country where name = 'UK')
),
-- females who visited US
us_person as (
  select distinct person_id
  from t_last_city_visited t
  inner join t_person p on t.person_id = p.id and 'F' = p.gender
  where country_id  = (select id from t_country where name = 'US')
)
-- females who visited UK but not US
select uk.person_id
from uk_person uk
left join us_person us on uk.person_id = us.person_id
where us.person_id is null

This is one of the many ways this query can be formed. You might have to run them to find out which one works best and indexing tweaks you may need to make to have them run faster.

Upvotes: 0

Related Questions