DXC
DXC

Reputation: 75

R: select rows by group after resampling

I want to do bootstrapping manually for a panel dataset. I need to cluster at individual level to make sure the consistency of later manipulation, that is to say that all the observations for the same individual need to be selected in bootstrap sample. What I do is to do resampling with replacement on the vector of unique individual IDs, which is used as the index.

df <- data.frame(ID = c("A","A","A","B","B","B","C","C","C"), v1 = c(3,1,2,4,2,2,5,6,9), v2 = c(1,0,0,0,1,1,0,1,0))

boot.index <- sample(unique(df$ID), replace = TRUE)

Then I select rows according to the index, suppose boot.index = (B, B, C), I want to have a data frame like this

ID v1 v2
B  4  0
B  2  1
B  2  1
B  4  0 
B  2  1
B  2  1
C  5  0
C  6  1
C  9  0

Apparently df1 <- df[df$ID == testboot.index,] does not give what I want. I tried subset and filter in dplyr, nothing works. Basically this is a issue of selecting the whole group by group index, any suggestions? Thanks!

Upvotes: 0

Views: 496

Answers (3)

d.b
d.b

Reputation: 32558

set.seed(42)
boot.index <- sample(unique(df$ID), replace = TRUE)
boot.index
#[1] C C A
#Levels: A B C

do.call(rbind, lapply(boot.index, function(x) df[df$ID == x,]))
#   ID v1 v2
#7   C  5  0
#8   C  6  1
#9   C  9  0
#71  C  5  0
#81  C  6  1
#91  C  9  0
#1   A  3  1
#2   A  1  0
#3   A  2  0

Upvotes: 0

ags29
ags29

Reputation: 2696

You can also do this with a join:

boot.index = c("B", "B", "C")
merge(data.frame("ID"=boot.index), df, by="ID", all.x=T, all.y=F)

Upvotes: 0

amrrs
amrrs

Reputation: 6325

%in% to select the relevant rows would get your desired output.

> df
  ID v1 v2
1  A  3  1
2  A  1  0
3  A  2  0
4  B  4  0
5  B  2  1
6  B  2  1
7  C  5  0
8  C  6  1
9  C  9  0
> boot.index
[1] A B A
Levels: A B C
> df[df$ID %in% boot.index,]
  ID v1 v2
1  A  3  1
2  A  1  0
3  A  2  0
4  B  4  0
5  B  2  1
6  B  2  1

dplyr::filter based solution:

> df %>% filter(ID  %in% boot.index)
  ID v1 v2
1  A  3  1
2  A  1  0
3  A  2  0
4  B  4  0
5  B  2  1
6  B  2  1

Upvotes: 0

Related Questions