Poca
Poca

Reputation: 31

Subset data based on vector with repeated observations

I have the following data with two observations per subject:

SUBJECT <- c(8,8,10,10,11,11,15,15)
POSITION <- c("H","L","H","L","H","L","H","L")
TIME <- c(90,90,30,30,30,30,90,90)
RESPONSE <- c(5.6,5.2,0,0,4.8,4.9,1.2,.9)

DATA <- data.frame(SUBJECT,POSITION,TIME,RESPONSE)

I want the rows of DATA which have SUBJECT numbers that are in a vector, V:

V <- c(8,10,10)

How can I obtain both observations from DATA whose SUBJECT number is in V and have those observations repeated the same number of times as the corresponding SUBJECT number appears in V?

Desired result:

SUBJECT <- c(8,8,10,10,10,10)
POSITION <- c("H","L","H","L","H","L")
TIME <- c(90,90,30,30,30,30)
RESPONSE <- c(5.6,5.2,0,0,0,0)

OUT <- data.frame(SUBJECT,POSITION,TIME,RESPONSE)

I thought some variation of the %in% operator would do the trick but it does not account for repeated subject numbers in V. Even though a subject number is listed twice in V, I only get one copy of the corresponding rows in DATA.

I could also create a loop and append matching observations but this piece is inside a bootstrap sampler and this option would dramatically increase computation time.

Upvotes: 3

Views: 135

Answers (1)

thelatemail
thelatemail

Reputation: 93813

merge is your friend:

merge(list(SUBJECT=V), DATA)
#  SUBJECT POSITION TIME RESPONSE
#1       8        H   90      5.6
#2       8        L   90      5.2
#3      10        H   30      0.0
#4      10        L   30      0.0
#5      10        H   30      0.0
#6      10        L   30      0.0

As @Frank implies, this logic can be translated to data.table or dplyr or sql of anything else that will handle a left-join.

Upvotes: 5

Related Questions