AliCivil
AliCivil

Reputation: 2053

Random sample from a data frame in R

I have the following data frame:

id<-c(1,1,2,3,3)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")

id     date      date2
1   23-01-08 2008-01-23
1   01-11-07 2007-11-01
2   30-11-07 2007-11-30
3   17-12-07 2007-12-17
3   12-12-08 2008-12-12

Now I want to extract a random sample of ids and not the rows. In fact I am looking for a way to randomly pick two of the ids and extract all records related to them. For instance if it randomly pick ids 2 and 3 the output data frame should look like:

id     date      date2
2   30-11-07 2007-11-30
3   17-12-07 2007-12-17
3   12-12-08 2008-12-12

Any helps would be appreciated.

Upvotes: 5

Views: 16934

Answers (5)

mpalanco
mpalanco

Reputation: 13580

Using sqldf:

library(sqldf)
a <- sqldf("SELECT DISTINCT id FROM df  ORDER BY RANDOM(*) LIMIT 2")
sqldf("SELECT * FROM df WHERE id IN a")

Ouput:

  id     date      date2
1  1 23-01-08 2008-01-23
2  1 01-11-07 2007-11-01
3  3 17-12-07 2007-12-17
4  3 12-12-08 2008-12-12

Upvotes: 0

user1021713
user1021713

Reputation: 2203

You can use sample function.

set.seed(2)
df[match(sample(unique(df$id),2),df$id),]

sample() function will generate random indexes and then you can match them back to your df data frame rows and get the rest of the data. For more information check ?sample

Upvotes: 3

Diego Aguado
Diego Aguado

Reputation: 1596

First you have to generate the sample indexes:

s_ids=sample(unique(df$id),2)

now that you have that you select the proper records in your df

new_df=df$[df$id %in% s_ids,]

Upvotes: 1

akrun
akrun

Reputation: 887811

Or using dplyr

library(dplyr)
df %>% 
    filter(id %in% sample(unique(id),2))
#  id     date      date2
#1  2 30-11-07 2007-11-30
#2  3 17-12-07 2007-12-17
#3  3 12-12-08 2008-12-12

Or

df %>%
     select(id) %>%
     unique() %>%
     sample_n(2) %>%
     semi_join(df, .)
#  id     date      date2
#1  1 23-01-08 2008-01-23
#2  1 01-11-07 2007-11-01
#3  2 30-11-07 2007-11-30

Upvotes: 4

MrFlick
MrFlick

Reputation: 206536

You can randomly pick two IDs using sample()

chosen <- sample(unique(df$id), 2)

and then extract those records

subset(df, id %in% chosen)

Upvotes: 8

Related Questions