fabian
fabian

Reputation: 1881

How can I select a row within a group based on a condition in SQL?

Question Please consider the following table:

+--------------+--------+--------+
| transactionID | Sgroup | Rgroup |
+--------------+--------+--------+
|            1 | A      | I      |
|            1 | A      | J      |
|            2 | B      | B      |
|            2 | B      | K      |
+--------------+--------+--------+

For each transactionID (2 rows are associated with ID 1, two rows with ID 2) I want to select the row for which Sgroup = Rgroup, if any row within a transactionID satisfies the condition. Otherwise, I want to select a row at random. For each transactionID at most one row satisfies Sgroup = Rgroup. How can I do this?

Attempted Solution I know how to select rows for which the condition Sgroup = Rgroup is fulfilled as follows:

SELECT *
FROM Transaction 
WHERE Sgroup = Rgroup;

+---------------+--------+--------+
| transactionID | Sgroup | Rgroup |
+---------------+--------+--------+
|             2 | B      | B      |
+---------------+--------+--------+

I also know how to chose a row randomly (thanks to this question) if the condition is not fulfilled as follows:

SELECT * FROM
(SELECT *
FROM Transaction
WHERE NOT transactionID IN
(SELECT transactionID
FROM Transaction 
WHERE Sgroup = Rgroup)
ORDER BY RAND()) AS temp
GROUP BY temp.transactionID;

+---------------+--------+--------+
| transactionID | Sgroup | Rgroup |
+---------------+--------+--------+
|             1 | A      | I      |
+---------------+--------+--------+

How can I combine these two expressions into one? I tried working with a CASE expression I didn't get far. Can somebody kindly suggest a solution?

Example Code Here is the code to generate the table:

CREATE DATABASE MinimalExample;
USE MinimalExample;

CREATE TABLE Transaction (
transactionID int,
Sgroup nvarchar(1),
Rgroup nvarchar(1)
);

INSERT INTO Transaction VALUES
(1,'A','I'), 
(1,'A','J'),
(2,'B','B'),
(2,'B','K');

Upvotes: 3

Views: 138

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271231

I think variables might be the simplest solution if you really mean "random":

select t.*
from (select t.*,
             (@rn := if(@i = transactionID, @rn + 1,
                        if(@i := transactionID, 1, 1)
                       )
             ) as rn
      from (select t.*
            from t
            order by transactionID, (sgroup = rgroup) desc, rand()
           ) t cross join
           (select @i := -1, @rn := 0) params
     ) t
where rn = 1;

If by "random" you mean "arbitrary", you can use this quick-and-dirty trick:

(select t.*
 from t
 where sgroup = rgroup
)
union all
(select t.*
 from t
 where not exists (select 1 from t t2 where t2.id = t.id and t2.sgroup = t2.rgroup)
 group by transactionID
);

This uses the dreaded select * with group by, something which I strongly discourage using under almost all circumstances. However, in this case, you are specifically trying to reduce each group to an indeterminate row, so it doesn't seem quite so bad. I will note that MySQL does not guarantee that the columns in the result set all come from the same row, although in practice they do.

Finally, if you had a unique primary key on each row, you could use probably the simplest solution:

select t.*
from t
where t.id = (select t2.id
              from t t2
              where t2.transactionID = t.transactionID
              order by (rgroup = sgroup) desc, rand()
             );

Upvotes: 3

Related Questions