den
den

Reputation: 169

Create a subset of a data.frame by removing specific rows

I am trying to subdivide a "mother data.frame" into three data.frames: the mother data.frame, called dfrm, has different variables including id (identification), time (three time points), a numerical variable Ht, and a factor fac with 3 levels depending on Ht.

I created 2 data.frames, dfrm2 and dfrm3, using the ddply function, sorting subjects having a certain level of the fac variable AT EACH OF THE THREE TIME POINTS:

id  <-  rep(c(seq(1,50,1)),3)
    time  <- factor(rep(c("day1", "day2", "day3"), c(50,50,50)), levels=c("day1", "day2", "day3"), labels=c("day1", "day2", "day3"), ordered=TRUE)
    Ht  <- rnorm(150, mean=30, sd=3)
    A  <- rnorm(150, mean=7, sd=10)
    df  <-  as.data.frame(cbind(id,time,Ht,A))
    head(df)
    fac <- factor(cut(df$Ht, breaks=c(1,30,35,100), labels=c("<30%","<35%", ">35%"), include.lowest=TRUE))
    dfrm  <- as.data.frame(cbind(df,fac))

library(plyr)
dfrm2  <-  ddply(dfrm, "id", function(x) if(all(x$fac=="<30%")) x else NULL)
nrow(dfrm2)
    [1] 18
dfrm3  <-  ddply(dfrm, "id", function(x) if(all(x$fac=="<35%")) x else NULL)
nrow(dfrm3)
    [1] 6

I would like to create the third data.frame, with all the rows that have not been selected in dfrm2 or dfrm3. Up until now I was not successful.

I think the idea could be to indicate R to remove rows from the mother dfrm according to id not selected yet. Can someone help me on this?

Upvotes: 1

Views: 1913

Answers (2)

Metrics
Metrics

Reputation: 15458

You can just use the split function:

l<-split(df,dfrm$fac)
names(l)<-paste0("data",1:length(levels(dfrm$fac)))

Updated as per comments:

 dfrm4<-dfrm[!(dfrm$id %in% dfrm2$id|dfrm$id %in% dfrm3$id),]
    > dim(dfrm4)
    [1] 117   5

Upvotes: 2

Drew Steen
Drew Steen

Reputation: 16607

I think plyr is the solution to almost every problem in R, but in my opinion this is an exception; bracket subsetting would be clearer & easier.

dfrm2 <- dfrm[dfrm$fac=="<30%", ]
dfrm3 <- dfrm[dfrm$fac=="<35%", ]
dfrm4 <- dfrm[dfrm$fac!="<30%" % dfrm$fac!="<35%", ]

Upvotes: 0

Related Questions