Nikhil Kumar Singh
Nikhil Kumar Singh

Reputation: 33

Redundant values while fetching distinct values from column after joins

While I was trying to fetch unique email ids from my postgres database, I am still getting redundant values. The query is as follows :

select distinct(t2.email_id), user_id, registration_date, 
  last_login, status, count_uo  
from (
  select t1.* 
  from (
    select distinct(u.email_id), u.user_id, 
      u.registration_date, u.last_login, 
      u.status, count(distinct(uo.id)) as count_uo 
    from users u 
    join user_offers uo on u.user_id = uo.user_id 
      and u.email_id != ''  
      and uo.offer_id in ('13', '9', 18, 7, 19, 25) 
    join user_utils uu on u.user_id = uu.user_id 
      and uu.carrier ~* 'Airtel' 
      or  uu.carrier ~* 'Jio' 
      or  uu.carrier ~* 'Idea' 
      or  uu.carrier ~* '!dea' 
    where u.registration_date::date between date'2016-08-04' and date'2017-09-28' 
      and u.last_login::date between date'2017-06-01' and date'2017-09-29' 
      and u.gender = 'm' 
      and u.status = 'sms-verified' 
      and u.email_verification_status = 'UN-VERIFIED' 
      and u.email_id != '' group by u.user_id
    ) as t1 
  where t1.count_uo >1 and t1.count_uo < 100
) t2; 

I get the output as follows, even after applying distinct twice.

email_id       | user_id |     registration_date      |         last_login         |    status    | count_uo 
---------------+---------+----------------------------+----------------------------+--------------+----------
 [email protected] |     509 | 2017-07-26 16:59:50.608219 | 2017-07-26 17:56:54.88664  | sms-verified |        3
 [email protected] |     518 | 2017-08-18 19:26:45.217283 | 2017-08-22 15:38:01.591841 | sms-verified |        3
 [email protected] |     512 | 2017-08-17 12:01:00.003048 | 2017-08-21 17:52:56.303841 | sms-verified |        3

Since I'm weak in SQL, any help will be appreciated very much.

Upvotes: 2

Views: 71

Answers (2)

bitifet
bitifet

Reputation: 3669

You have two users (rows) with '[email protected]' as email_id: Notice that they have distinct value in user_id column (509 and 512).

As @GordonLinoff said, you can hide one of that results by using DISTINCT ON clause. But I figure out that it's not what you want...

I imagine it's more likely you inserted some test data and duplicated '[email protected]' in it.

This also point out (I think) a mistake in your model definition. (missing UNIQUE constraints over both email_id and user_id columns in your users table to avoid it could happen again I mean).

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269803

If you are using Postgres, you can use distinct on:

select distinct on (t2.email_id) t2.email_id, user_id,
       registration_date, last_login, status, count_uo  
from ( . . . ) t2
order by t2.email_id;

You can add a second key to the order by to get a particular row (say the most recent login by using order by t2.email_id, last_login desc).

Upvotes: 1

Related Questions