Reputation: 2795
The project is using Postgres 9.3
I have tables (that I have simplified) as follows:
t_person (30 million records)
- id
- first_name
- last_name
- gender
t_city (70,000 records)
- id
- name
- country_id
t_country (20 records)
- id
- name
t_last_city_visited (over 200 million records)
- person_id
- city_id
- country_id
- There is a unique constraint on person_id, country_id to
ensure that each person only has one last city per country
What I need to do are variations on the following:
Get the ids of Person who are female who have visited country 'UK' but have never visited country 'USA'
I have tried the following, but it is too slow.
select t_person.id from t_person
join t_last_city_visited
on (
t_last_city_visited.person_id = t_person.id
and country_id = (select id from t_country where name = 'UK')
)
where gender = 'female'
except
(
select t_person.id from t_person
join t_last_city_visited
on (
t_last_city_visited.person_id = t_person.id
and country_id = (select id from t_country where name = 'USA')
)
)
I would really appreciate any help.
Upvotes: 2
Views: 5903
Reputation: 6572
This is the way I would approach it, you can later substitute the inner queries by a with alias as @zedfoxus said
select
id
from
(SELECT
p.id id
FROM
t_person p JOIN t_last_city_visited lcv
ON(lcv.person_id = p.id)
JOIN country c
ON(lcv.country_id = c.id and cname = 'UK')
WHERE
p.gender = 'female') v JOIN
(SELECT
p2.id id
FROM
t_person p2 JOIN t_last_city_visited lcv2
ON(lcv2.person_id = p2.id)
JOIN country c
ON(lcv.country_id = c.id and cname != 'USA')
WHERE
p.gender = 'female') nv
ON(v.id = nv.id)
Upvotes: 0
Reputation: 52356
Hint: What you want to do here is to find the females for whom there EXISTS a visit to the UK, but where NOT EXISTS a visit to the US.
Something like:
select ...
from t_person
where ...
and exists (select null
from t_last_city_visited join
t_country on (...)
where t_country.name = 'UK')
and not exists (select null
from t_last_city_visited join
t_country on (...)
where t_country.name = 'US')
Another approach, to find the people who have visited the UK and not the US, which you can then join to the people to filter by gender:
select person_id
from t_last_city_visited join
t_country on t_last_city_visited.country_id = t_country.id
where t_country.name in ('US','UK')
group by person_id
having max(t_country.name) = 'UK'
Upvotes: 6
Reputation: 37069
Could you please run analyze and execute this query?
-- females who visited UK
with uk_person as (
select distinct person_id
from t_last_city_visited t
inner join t_person p on t.person_id = p.id and 'F' = p.gender
where country_id = (select id from t_country where name = 'UK')
),
-- females who visited US
us_person as (
select distinct person_id
from t_last_city_visited t
inner join t_person p on t.person_id = p.id and 'F' = p.gender
where country_id = (select id from t_country where name = 'US')
)
-- females who visited UK but not US
select uk.person_id
from uk_person uk
left join us_person us on uk.person_id = us.person_id
where us.person_id is null
This is one of the many ways this query can be formed. You might have to run them to find out which one works best and indexing tweaks you may need to make to have them run faster.
Upvotes: 0