liltitus27
liltitus27

Reputation: 1750

Select Grouped Rows Having a certain Property

I'm trying to select only certain rows containing a certain property. Here's a sample of the data I'm working with:

src_id                                              cand_source
------                                              -----------
201609-004d7bgNDFXuIrQPXwsXrOptt2PdTdeXsjV5RJ6_mEQ  mcp
201609-004d7bgNDFXuIrQPXwsXrOptt2PdTdeXsjV5RJ6_mEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ  mc2
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mc2
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q  mc2
201609-01noPFGBCqbH9jUB9MHNqPynjqW8cr24LJY917vSGTs  mc2
201609-01noPFGBCqbH9jUB9MHNqPynjqW8cr24LJY917vSGTs  mc2
201609-02ISoPEX0VVkQ0ogot49Q-e7K39Zyk2vdN1rB4Q-kl0  mc2
201609-02ISoPEX0VVkQ0ogot49Q-e7K39Zyk2vdN1rB4Q-kl0  mc2
201609-02LVZ8UqAaz7JCp3RAOTiIE7zH2mveiSQPBo6I6dHDc  mc2
201609-02LVZ8UqAaz7JCp3RAOTiIE7zH2mveiSQPBo6I6dHDc  mc2
201609-03dLH32kaKYVwIj4HiT1tZjCNgqgXiG-fvezX3S9QI4  mc2
201609-03dLH32kaKYVwIj4HiT1tZjCNgqgXiG-fvezX3S9QI4  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mcp
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4  mc2
201609-04JzR3AMxsfQvAeq1MAgjCtMhcaqt2Z_WNmuUlYLrLM  mc2
201609-04JzR3AMxsfQvAeq1MAgjCtMhcaqt2Z_WNmuUlYLrLM  mcp

What I want to do is select only the src_ids having at least one cand_source equaling mcp. Here's what I've tried:

SELECT *
FROM schema.table
WHERE src_id IN (
    SELECT src_id
    FROM schema.table
    WHERE batch_id = ?
    GROUP BY src_id
    HAVING count(cand_source = 'mcp') > 1
)
ORDER BY src_id,
    match_score DESC

That, however, keeps giving me back clusters of src_ids that have no cand_sources equaling mcp.


It was pointed out that I'm simply overcomplicating matters. Here's the solution:

SELECT *
FROM schema.table
WHERE src_id IN (
    SELECT DISTINCT src_id
    FROM schema.table
    WHERE batch_id = ?
        AND cand_source = 'mcp'
)
ORDER BY src_id,
    match_score DESC

Upvotes: 1

Views: 49

Answers (2)

Matt
Matt

Reputation: 14341

If you simply want the src_id that has mcp then a straight query with a WHERE clause is enough no need for conditional aggregation or anything.

SELECT DISTINCT 
    src_id
FROM
    Table
WHERE
    cand_source = 'mcp'
    AND batch_id = ?

If you want all of the records for each src_id that has at least 1 cand_source you can then join that back to the table to receive all of the records.

SELECT t.*
FROM
    Table t
INNER JOIN 
    (SELECT DISTINCT src_id
     FROM Table
     WHERE cand_source = 'mcp'
       AND batch_id = ? ) d ON t.src_id = d.src_id
                            AND t.batch_id = ?

Or you can use a Common Table Expression with the awesome window functions to do it.

WITH cte AS 
(
    SELECT *, COUNT(CASE WHEN cand_source = 'mcp' THEN cand_source END) OVER (PARTITION BY src_id) as McpCount
    FROM
       Table
    WHERE
       batch_id = ?

)
SELECT *
FROM
    cte
WHERE
    McpCount > 0;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

If you want just the source ids, then your subquery is all you need. But, you want to count the number of matching values. Here is verbose logic:

SELECT src_id
FROM schema.table
WHERE batch_id = ?
GROUP BY src_id
HAVING SUM(case when cand_source = 'mcp' then 1 else 0 end) > 1

A more concise version is:

HAVING SUM(cand_source = 'mcp'::int) > 1

Upvotes: 1

Related Questions