jsage
jsage

Reputation: 1

Subquery in Snowflake select clause not evaluating row by row?

I'm trying to write a very straightforward query: I want to pull all accounts in a snowflake table, and flag accounts (True/False) that can be linked to a record in the opportunity table with "Startup" in the type field.

Here is a query that I thought would return what I am looking for:

SELECT
    acc.id
    , acc.id IN (
        SELECT DISTINCT(opp.accountid)
        FROM "OPPORTUNITY" opp
        WHERE opp.type LIKE '%Startup%'
        ) AS startup
FROM "ACCOUNT" acc

The account table has ~100,000 distinct records and the subquery returns a list of only ~11,000 distinct account ids, yet the field aliased as 'startup' contains TRUE for each account. I've tried diagnosing why this returns all True with no luck. I also tried writing this same query using EXISTS with a correlated subquery and got the same results.

I was able to build a working query by essentially left joining the sub query after the FROM clause, but I would love to understand why the query above fails to evaluate correctly for each row.

Any help would be appreciated. Thanks!

Upvotes: 0

Views: 1000

Answers (2)

Rich Murnane
Rich Murnane

Reputation: 2920

it'd be interesting for us to see the results of these two queries:

SELECT COUNT(DISTINCT accountid)
FROM   opportunity
WHERE  type LIKE '%Startup%';


SELECT COUNT(DISTINCT accountid)
FROM   opportunity
WHERE  type iLike '%Startup%';

Something like the following might be your ultimate goal I think:

SELECT acc.id,
       startup.accountid
FROM   ACCOUNT acc
LEFT OUTER JOIN (
    SELECT COUNT(DISTINCT accountid) as accountid
    FROM   opportunity
    WHERE  type iLike '%Startup%'
    AND    accountid IS NOT NULL) startup 
ON acc.id = startup.accountid;

**** begin additional info ****

After reading the comments, I decided to play with this a little more, sure enough my test is working, perhaps you have nulls or similar messing up your data or results? I'm not sure but this is working for me!

create table account (id varchar);
insert into account values ('A'), ('B'),('C'),('D');

create table opportunity (accountid varchar, type varchar);
insert into opportunity values ('D', 'New Subscription'), ('X', 'Stuff'), ('Y', 'Stuff'), ('Z', 'Stuff');

select account.id,
       account.id in (select distinct accountid FROM opportunity WHERE type like '%Subscr%') as startup
FROM   account;

ID  STARTUP
A   FALSE
B   FALSE
C   FALSE
D   TRUE

Upvotes: 0

Stuart Ozer
Stuart Ozer

Reputation: 1384

I cannot reproduce the behavior you see:

create or replace table ACCOUNT(id int);
insert into ACCOUNT values (1), (2), (3);
create or replace table OPPORTUNITY(accountid int, type varchar);
insert into OPPORTUNITY values (1, 'Startup'), (2, 'Startup');

Running your query above then produces the result:

Row   ID   STARTUP
1     1    TRUE
2     2    TRUE
3     3    FALSE

Upvotes: 1

Related Questions