Tom Wenseleers
Tom Wenseleers

Reputation: 8019

R: Subset dataframe based on list of allowed factor levels

I am on the lookout for a function that would return the rows in a dataframe mydata

mydata=data.frame(group1=c(rep("MALE",6),rep("FEMALE",6)),group2=c(rep("TREATED",3),rep("UNTREATED",3)))
mydata
   group1    group2
1    MALE   TREATED
2    MALE   TREATED
3    MALE   TREATED
4    MALE UNTREATED
5    MALE UNTREATED
6    MALE UNTREATED
7  FEMALE   TREATED
8  FEMALE   TREATED
9  FEMALE   TREATED
10 FEMALE UNTREATED
11 FEMALE UNTREATED
12 FEMALE UNTREATED

for which columns are equal to particular factor levels, specified as a list

selection=list(group1="MALE",group2="TREATED")

In this example, this function would return a vector of selected rows

c(1,2,3)

What would be the easiest and fastest way to do this, without using loops etc?

PS The list selection could be of any length, and there could be any number of columns in my dataframe of any name.

(I know subset, but this is not quite what I am looking for)

EDIT: A function I just made to do the above is the following, but it is not elegant, so I was just wondering if there are already any built-in functions to do what I want :

mydata=data.frame(group1=c(rep("MALE",6),rep("FEMALE",6)),group2=c(rep("TREATED",3),rep("UNTREATED",3)))
selection=list(group1="MALE",group2="TREATED")

selrows=function(mydata,selection) {
nms=names(selection)
sel=data.frame(matrix(TRUE,nrow=nrow(mydata),ncol=length(nms)))
for (i in 1:length(nms)) { sel[,i]=(mydata[,nms[[i]]]==selection[nms[[i]]][[1]]) }
which(apply(sel*1,1,prod)==1)
}

selrows(mydata,selection)
1 2 3

Upvotes: 1

Views: 1124

Answers (1)

RHertel
RHertel

Reputation: 23818

Maybe this helps:

which(mydata[,1] %in% unlist(selection) & mydata[,2] %in% unlist(selection))
#[1] 1 2 3

Upvotes: 1

Related Questions