playbyplay
playbyplay

Reputation: 11

Getting duplicate values assigned with multiple join conditions

I have 4 tables with the columns outlined far below. I am using multiple join conditions to bring in the Activities table, which is resulting in Activity IDs getting duplicated, and mapped to Deal IDs they shouldn't be. Activity IDs are mapped to Account IDs in the DB, hence this hurdle. In the desired output below, deals.id can be null.

I am looking for the output below:

distinct count of activities.id | activities.accountid | deals.id | users.id | users.name

Is there some way to make this happen? Maybe create an index, array, or a new key? Any guidance here would be greatly appreciated! Here is a sample output of the data with just one rep's data to exhibit the issue. I am using Presto; below is my query:

select ac.id activityid, d.id as dealid, u.name, a.id as accountid 
  from accounts a 
  left join activities ac 
    on a.id = ac.accountid 
  join users u 
    on u.id = ac.ownerid 
  left join deals d
    on a.id = d.accountid and d.ownerid = ac.ownerid 
where u.name = 'John'

Accounts:

id |

Activities:

id | accountid | userid

Deals:

id | accountid | userid

Users:

id | name

Upvotes: 1

Views: 58

Answers (1)

Kevin D
Kevin D

Reputation: 98

When I encounter a problem I usually break down the query into smaller queries to try and isolate the problem. So for instance, just join from accounts to activities to see if the duplication is already occurring or if it doesn't occur until you join in deals.

Now on to sorting out the dups. My suggestion would be to try joining all using the deals table as the starting point, so deals left join accounts should only return the account once for each deal, left join activities should return many.

In other words, just try changing your join clause under deals from

on a.id = d.accountid and d.ownerid = ac.ownerid 

to

on ac.accountid = d.accountid and d.ownerid = ac.ownerid 

If that doesn't work, I'd try rewriting the query from deals as a starting point like this... Note that a good practice is to put the join from fields on the left side of the on clause. (eg. select a left join b on a.id = b.id left join c on a.id = c.id, etc..)

select ac.id activityid, d.id as dealid, u.name, a.id as accountid   
from deals d
left join accounts a
    on d.accountid = a.id 
left join activities ac
    on d.ownerid = ac.ownerid
    and d.accountid = ac.accountid
left join users u
    on d.ownerid = u.id
    and u.name = 'John'        

This is making a big assumption that a deal can't be mapped to two activities with the same owner and id. If this is the case, you need to join to activities using deal id if there is a deal id in the activities table.

If this doesn't work, I use subqueries the following to isolate dups. Example...

select ac.id activityid, d.id as dealid, u.name, a.id as accountid   
from (select * from deals where id in (18,7) d
left join (select * from accounts where id = 21) a
    on d.accountid = a.id 
left join (select * from activities where id = 13) ac
    on d.ownerid = ac.ownerid
    and d.accountid = ac.accountid
left join (select * from users where u.Name = 'John') u
    on d.ownerid = u.id       

I hope this helps. It's always hard without recreating the raw tables. Consider creating example tables that match your schema to db-fiddle and posting a link.

Upvotes: 1

Related Questions