Reputation: 41
I am using SQL and obtained the following table:
userID| time | location
A10 | 20130801| 1000
A10 | 20130802| 1002
A10 | 20130806| 1008
B21 | 20130803| 1000
B21 | 20130801| 1099
C11 | 20130802| 1000
D33 | 20130802| 1002
D33 | 20130806| 1877
E01 | 20130801| 1765
E01 | 20130801| 1000
E01 | 20130802| 1000
where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.
In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).
I've written:
Select time,location,
count(DISTINCT userID) as n_uu
from( ---
--- here, I construct the example table
---
) as maintable
where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
(0x7fffffff AS int)
group by time, location
where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):
time | location | n_uu
20130801| 1000 | 2
20130801| 1765 | 1
20130802| 1002 | 2
20130802| 1000 | 2
20130806| 1008 | 1
20130806| 1877 | 1
Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.
How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.
===Added =========================
Select time,location,
count(DISTINCT userID) as n_uu
from(
(Select
--- here, I construct the example table
---
) as maintable
Select maintable.*
from maintable join
(Select top 80 percent userID
from (Select Distinct userID from maintable) newtable
order by NEWID()
) newtable
on maintable.userID = newtable.userID
)
group by time, location
Upvotes: 1
Views: 788
Reputation: 1269633
Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:
select t.*
from t join
(select top 80 percent userid
from (select distinct userid from t) u
order by newid()
) u
on t.userid = u.userid;
Upvotes: 4