Reputation: 2053
I have the following data frame:
id<-c(1,1,2,3,3)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")
id date date2
1 23-01-08 2008-01-23
1 01-11-07 2007-11-01
2 30-11-07 2007-11-30
3 17-12-07 2007-12-17
3 12-12-08 2008-12-12
Now I want to extract a random sample of ids and not the rows. In fact I am looking for a way to randomly pick two of the ids and extract all records related to them. For instance if it randomly pick ids 2 and 3 the output data frame should look like:
id date date2
2 30-11-07 2007-11-30
3 17-12-07 2007-12-17
3 12-12-08 2008-12-12
Any helps would be appreciated.
Upvotes: 5
Views: 16934
Reputation: 13580
Using sqldf:
library(sqldf)
a <- sqldf("SELECT DISTINCT id FROM df ORDER BY RANDOM(*) LIMIT 2")
sqldf("SELECT * FROM df WHERE id IN a")
Ouput:
id date date2
1 1 23-01-08 2008-01-23
2 1 01-11-07 2007-11-01
3 3 17-12-07 2007-12-17
4 3 12-12-08 2008-12-12
Upvotes: 0
Reputation: 2203
You can use sample
function.
set.seed(2)
df[match(sample(unique(df$id),2),df$id),]
sample()
function will generate random indexes and then you can match them back to your df
data frame rows and get the rest of the data.
For more information check ?sample
Upvotes: 3
Reputation: 1596
First you have to generate the sample indexes:
s_ids=sample(unique(df$id),2)
now that you have that you select the proper records in your df
new_df=df$[df$id %in% s_ids,]
Upvotes: 1
Reputation: 887811
Or using dplyr
library(dplyr)
df %>%
filter(id %in% sample(unique(id),2))
# id date date2
#1 2 30-11-07 2007-11-30
#2 3 17-12-07 2007-12-17
#3 3 12-12-08 2008-12-12
Or
df %>%
select(id) %>%
unique() %>%
sample_n(2) %>%
semi_join(df, .)
# id date date2
#1 1 23-01-08 2008-01-23
#2 1 01-11-07 2007-11-01
#3 2 30-11-07 2007-11-30
Upvotes: 4
Reputation: 206536
You can randomly pick two IDs using sample()
chosen <- sample(unique(df$id), 2)
and then extract those records
subset(df, id %in% chosen)
Upvotes: 8