Reputation: 169
I am trying to subdivide a "mother data.frame" into three data.frames: the mother data.frame, called dfrm
, has different variables including id
(identification), time
(three time points), a numerical variable Ht
, and a factor fac
with 3 levels depending on Ht.
I created 2 data.frames, dfrm2
and dfrm3
, using the ddply
function, sorting subjects having a certain level of the fac
variable AT EACH OF THE THREE TIME POINTS:
id <- rep(c(seq(1,50,1)),3)
time <- factor(rep(c("day1", "day2", "day3"), c(50,50,50)), levels=c("day1", "day2", "day3"), labels=c("day1", "day2", "day3"), ordered=TRUE)
Ht <- rnorm(150, mean=30, sd=3)
A <- rnorm(150, mean=7, sd=10)
df <- as.data.frame(cbind(id,time,Ht,A))
head(df)
fac <- factor(cut(df$Ht, breaks=c(1,30,35,100), labels=c("<30%","<35%", ">35%"), include.lowest=TRUE))
dfrm <- as.data.frame(cbind(df,fac))
library(plyr)
dfrm2 <- ddply(dfrm, "id", function(x) if(all(x$fac=="<30%")) x else NULL)
nrow(dfrm2)
[1] 18
dfrm3 <- ddply(dfrm, "id", function(x) if(all(x$fac=="<35%")) x else NULL)
nrow(dfrm3)
[1] 6
I would like to create the third data.frame, with all the rows that have not been selected in dfrm2
or dfrm3
. Up until now I was not successful.
I think the idea could be to indicate R to remove rows from the mother dfrm
according to id
not selected yet. Can someone help me on this?
Upvotes: 1
Views: 1913
Reputation: 15458
You can just use the split function
:
l<-split(df,dfrm$fac)
names(l)<-paste0("data",1:length(levels(dfrm$fac)))
Updated as per comments:
dfrm4<-dfrm[!(dfrm$id %in% dfrm2$id|dfrm$id %in% dfrm3$id),]
> dim(dfrm4)
[1] 117 5
Upvotes: 2
Reputation: 16607
I think plyr
is the solution to almost every problem in R, but in my opinion this is an exception; bracket subsetting would be clearer & easier.
dfrm2 <- dfrm[dfrm$fac=="<30%", ]
dfrm3 <- dfrm[dfrm$fac=="<35%", ]
dfrm4 <- dfrm[dfrm$fac!="<30%" % dfrm$fac!="<35%", ]
Upvotes: 0