Reputation: 1750
I'm trying to select only certain rows containing a certain property. Here's a sample of the data I'm working with:
src_id cand_source ------ ----------- 201609-004d7bgNDFXuIrQPXwsXrOptt2PdTdeXsjV5RJ6_mEQ mcp 201609-004d7bgNDFXuIrQPXwsXrOptt2PdTdeXsjV5RJ6_mEQ mc2 201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2 201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2 201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2 201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2 201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2 201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2 201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mcp 201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mcp 201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mc2 201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mcp 201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mc2 201609-01noPFGBCqbH9jUB9MHNqPynjqW8cr24LJY917vSGTs mc2 201609-01noPFGBCqbH9jUB9MHNqPynjqW8cr24LJY917vSGTs mc2 201609-02ISoPEX0VVkQ0ogot49Q-e7K39Zyk2vdN1rB4Q-kl0 mc2 201609-02ISoPEX0VVkQ0ogot49Q-e7K39Zyk2vdN1rB4Q-kl0 mc2 201609-02LVZ8UqAaz7JCp3RAOTiIE7zH2mveiSQPBo6I6dHDc mc2 201609-02LVZ8UqAaz7JCp3RAOTiIE7zH2mveiSQPBo6I6dHDc mc2 201609-03dLH32kaKYVwIj4HiT1tZjCNgqgXiG-fvezX3S9QI4 mc2 201609-03dLH32kaKYVwIj4HiT1tZjCNgqgXiG-fvezX3S9QI4 mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mcp 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2 201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2 201609-04JzR3AMxsfQvAeq1MAgjCtMhcaqt2Z_WNmuUlYLrLM mc2 201609-04JzR3AMxsfQvAeq1MAgjCtMhcaqt2Z_WNmuUlYLrLM mcp
What I want to do is select only the src_id
s having at least one cand_source
equaling mcp
. Here's what I've tried:
SELECT *
FROM schema.table
WHERE src_id IN (
SELECT src_id
FROM schema.table
WHERE batch_id = ?
GROUP BY src_id
HAVING count(cand_source = 'mcp') > 1
)
ORDER BY src_id,
match_score DESC
That, however, keeps giving me back clusters of src_id
s that have no cand_source
s equaling mcp
.
It was pointed out that I'm simply overcomplicating matters. Here's the solution:
SELECT *
FROM schema.table
WHERE src_id IN (
SELECT DISTINCT src_id
FROM schema.table
WHERE batch_id = ?
AND cand_source = 'mcp'
)
ORDER BY src_id,
match_score DESC
Upvotes: 1
Views: 49
Reputation: 14341
If you simply want the src_id
that has mcp then a straight query with a WHERE
clause is enough no need for conditional aggregation or anything.
SELECT DISTINCT
src_id
FROM
Table
WHERE
cand_source = 'mcp'
AND batch_id = ?
If you want all of the records for each src_id
that has at least 1 cand_source
you can then join that back to the table to receive all of the records.
SELECT t.*
FROM
Table t
INNER JOIN
(SELECT DISTINCT src_id
FROM Table
WHERE cand_source = 'mcp'
AND batch_id = ? ) d ON t.src_id = d.src_id
AND t.batch_id = ?
Or you can use a Common Table Expression with the awesome window functions to do it.
WITH cte AS
(
SELECT *, COUNT(CASE WHEN cand_source = 'mcp' THEN cand_source END) OVER (PARTITION BY src_id) as McpCount
FROM
Table
WHERE
batch_id = ?
)
SELECT *
FROM
cte
WHERE
McpCount > 0;
Upvotes: 1
Reputation: 1269773
If you want just the source ids, then your subquery is all you need. But, you want to count the number of matching values. Here is verbose logic:
SELECT src_id
FROM schema.table
WHERE batch_id = ?
GROUP BY src_id
HAVING SUM(case when cand_source = 'mcp' then 1 else 0 end) > 1
A more concise version is:
HAVING SUM(cand_source = 'mcp'::int) > 1
Upvotes: 1