sym246
sym246

Reputation: 1866

Select rows from a data frame according to another vector, including repetitions

Example data:

dates=seq(as.POSIXct("2015-01-01 00:00:00"), as.POSIXct("2015-01-07 00:00:00"), by="day")
data=rnorm(7,1,2)
groupID=c(12,14,16,24,35,46,54)

DF=data.frame(Date=dates,Data=data,groupID=groupID)

BB=c(12,12,16,24,35,35)
DF[DF$groupID %in% BB,]

        Date       Data groupID
1 2015-01-01  4.4104202       12
3 2015-01-03  2.1557735       16
4 2015-01-04 -0.9880946       24
5 2015-01-05 -0.3396025       35

I need to filter the data frame DF according to values in my vector BB which match the groupID column. However, if BB contains repetitions, this is not reflected in the result.

Since my vector BB includes two values of 1, and two of 5, the output should in fact be:

        Date       Data groupID
1 2015-01-01  4.4104202       12
1 2015-01-01  4.4104202       12
3 2015-01-03  2.1557735       16
4 2015-01-04 -0.9880946       24
5 2015-01-05 -0.3396025       35
5 2015-01-05 -0.3396025       35

Is there a way to achieve this? And to keep the ordering of the vector BB if possible?

Upvotes: 0

Views: 231

Answers (2)

Bayesric
Bayesric

Reputation: 349

You can transform BB into a data.frame and use merge() to merge DF and BB according to their groupID, to be specific:

dates=seq(as.POSIXct("2015-01-01 00:00:00"), as.POSIXct("2015-01-07 00:00:00"), by="day")
groupID=c(12,14,16,24,35,46,54)
set.seed(1234)
data=rnorm(7,1,2)
DF=data.frame(Date=dates,Data=data,groupID=groupID)
BB=data.frame(groupID=c(12,12,16,24,35,35))

Test result:

>merge(DF,BB,by="groupID")
  groupID       Date      Data
1      12 2015-01-01 -1.414131
2      12 2015-01-01 -1.414131
3      16 2015-01-03  3.168882
4      24 2015-01-04 -3.691395
5      35 2015-01-05  1.858249
6      35 2015-01-05  1.858249

Upvotes: 0

bgoldst
bgoldst

Reputation: 35314

Use match() (or findInterval()):

DF[match(BB,DF$groupID),];
##           Date      Data groupID
## 1   2015-01-01 1.2199835      12
## 1.1 2015-01-01 1.2199835      12
## 3   2015-01-03 1.8141556      16
## 4   2015-01-04 0.2748579      24
## 5   2015-01-05 3.2030200      35
## 5.1 2015-01-05 3.2030200      35

(Note that the Data column is different because you used rnorm() to generate it without calling set.seed() first. It is recommended to call set.seed() in any code sample where you incorporate randomness so that exact results can be reproduced.)

Upvotes: 1

Related Questions