Evgeni
Evgeni

Reputation: 141

SQL Random sample table with duplicates

Assuming I have the following table:

       unitid  |  transtatus |   currency
    ---------------------------------------
    1024393230 | not_started |  GBp - Pence
    1024397398 | in_progress |  GBp - Pence
    1024397398 | not_started |  USd - Cent
    1024397408 | not_started |  GBp - Pence
    1024397408 | not_started |  EUR
    1024401371 | not_started |  GBp - Pence
    1024403375 | in_progress |  GBp - Pence

I want to select random rows for QC I can do it by

select top 3  
tbble.unitid,tbble.transtatus, tbble.currency
 from tbble
order by newid()

However since some rows share the same unitid (and if this is the case) I want to pull all the rows associated with this unitid

So the query would return: (in case random rows only has one row for this unitid)

      unitid   | transtatus  |  currency
    ---------------------------------------
    1024393230 | not_started | GBp - Pence
    1024401371 | not_started | GBp - Pence
    1024403375 | in_progress | GBp - Pence

or : (in case of two rows associated with this unitid)

    1024397398 | in_progress | GBp - Pence
    1024397398 | not_started | USd - Cent
    1024401371 | not_started | GBp - Pence
    1024403375 | in_progress | GBp - Pence

Not really sure how can I achieve this. Maybe counting the number of unitid appearances first and then if count is over 1 then adding these to initial random sample?

Upvotes: 2

Views: 194

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269623

I think this accomplishes what you want, which is three random unit ids and all their rows:

select t.*
from tbble t join
     (select top 3 t.unitid
      from (select distinct t.unitid from tbble t) t
      order by newid()
     ) tt
     on t.unitid = tt.unitid

Upvotes: 1

Mudassir Hasan
Mudassir Hasan

Reputation: 28741

You should use WITH TIES clause to get matching rows

select top 3   WITH TIES *
tbble.unitid,tbble.transtatus, tbble.currency
 from tbble
order by newid()

Upvotes: 1

Daneel
Daneel

Reputation: 1183

Maybe you could use a subquery to work in two steps : first you select some random unitid with the subquery, then for each of them, you select the whole rows in your table with the same unitid. It should look like this :

select tbble.unitid, tbble.transtatus, tbble.currency
from tbble
where tbble.unitid in (select top 3  
tbble.unitid as randomunitid
 from tbble
order by newid())

Upvotes: 2

Related Questions