Reputation: 4299
I need to randomly select a diary for each individual (id) but only for those who filled more than one.
Let us suppose my data look like this
dta = rbind(c(1, 1, 'a'),
c(1, 2, 'a'),
c(1, 3, 'b'),
c(2, 1, 'a'),
c(3, 1, 'b'),
c(3, 2, 'a'),
c(3, 3, 'c'))
colnames(dta) <- c('id', 'DiaryNumber', 'type')
dta = as.data.frame(dta)
dta
id DiaryNumber type
1 1 a
1 2 a
1 3 b
2 1 a
3 1 b
3 2 a
3 3 c
For example, id 1 filled 3 diaries. What I need is to randomly select one of the 3 diaries. Id 2 only filled one diary, so I do not need to do anything with it.
I have no idea how I could do that. Any ideas ?
Upvotes: 1
Views: 1192
Reputation: 13570
Base package:
set.seed(123)
df <- lapply(split(dta, dta$id), function(x) x[sample(nrow(x), 1), ])
do.call("rbind", df)
Output:
id DiaryNumber type
1 1 1 a
2 2 1 a
3 3 2 a
Upvotes: 1
Reputation: 193517
You can use sample_n
:
library(dplyr)
dta %>% group_by(id) %>% sample_n(1)
## Source: local data frame [3 x 3]
## Groups: id
##
## id DiaryNumber type
## 1 1 2 a
## 2 2 1 a
## 3 3 1 b
Upvotes: 4