Reputation: 347
Can some one let me know why I am getting this error and how I can fix it?
Here is the code
What I am trying to do is remove the rows that associated 1's if the column of that one's less than 10
a0=rep(1,40)
a=rep(0:1,20)
b=c(rep(1,20),rep(0,20))
c0=c(rep(0,12),rep(1,28))
c1=c(rep(1,5),rep(0,35))
c2=c(rep(1,8),rep(0,32))
c3=c(rep(1,23),rep(0,17))
c4=c(rep(1,6),rep(0,34))
x=matrix(cbind(a0,a,b,c0,c1,c2,c3,c4),nrow=40,ncol=8)
nam <- paste("V",2:9,sep="")
colnames(x)<-nam
dat <- cbind(y=rnorm(40,50,7),x)
#===================================
toSum <- colSums(dat)
Col <- Val <- NULL
for(i in 1:length(toSum)){
if(toSum[i]<10){
Col <- c(Col,colnames(dat)[i])
Val <- c(Val,toSum[i])}
}
cs <- colSums(dat) < 10
indx <- dat[,which(cs)]==0
for(i in 1:dim(indx)[2]){
datnw <- dat[indx[,i],]
dat <- datnw}
datnw2 <- dat[, -which(cs)]
Thanks
Upvotes: 1
Views: 15932
Reputation: 60858
If I understand correctly what you're trying to achieve, you might best write it this way:
cs <- colSums(dat) < 10
dat[rowSums(dat[,cs]) == 0, !cs]
This means: for any column with sum less than 10 (called a “small column” hereafter), drop any row which has a 1 in that column. So you only keep rows which have a zero in all those small columns. You drop the small columns as well, as they would only contain zeros in any case.
In your code, indx
is a logical data frame with 40 rows, one for each row of input, and one column for each small column in the input. You use the first column of idx
to remove the rows with a 1 in the first short column. This results in a new value for dat
, which is a few rows shorter than the original. In the next iteration of the loop, you use the second logical vector in an attempt to remove more rows. But this won't work: after the first iteration, dat
has less than 40 rows, but the second column still has all 40 rows. This is what's causing the error: you're subscripting a vector of less than 40 elements with a logical vector of length 40.
You could combine the three columns of your indx
into a single vector suitable to subscript the rows of interest using the following expression:
apply(indx, 1, all)
This will have a TRUE
value in its result for exactly those rows which have TRUE
in each column. However, I guess I'd prefer my code above over this, as it is much shorter to write. The most likely reason to prefer the latter is if your data frame may contain negative number, so that a row sum of zero does not imply an all-zero row. Not a problem in your example data.
Upvotes: 2