Falcon-StatGuy
Falcon-StatGuy

Reputation: 347

Error: (subscript) logical subscript too long

Can some one let me know why I am getting this error and how I can fix it?

Here is the code

What I am trying to do is remove the rows that associated 1's if the column of that one's less than 10

a0=rep(1,40)
a=rep(0:1,20)
b=c(rep(1,20),rep(0,20))
c0=c(rep(0,12),rep(1,28))
c1=c(rep(1,5),rep(0,35))
c2=c(rep(1,8),rep(0,32))
c3=c(rep(1,23),rep(0,17))
c4=c(rep(1,6),rep(0,34))
x=matrix(cbind(a0,a,b,c0,c1,c2,c3,c4),nrow=40,ncol=8)
nam <- paste("V",2:9,sep="")
colnames(x)<-nam
dat <- cbind(y=rnorm(40,50,7),x)
#===================================
toSum <- colSums(dat)
Col <- Val <- NULL
for(i in 1:length(toSum)){
if(toSum[i]<10){
Col <- c(Col,colnames(dat)[i])
Val <- c(Val,toSum[i])}
}
cs <- colSums(dat) < 10
indx <- dat[,which(cs)]==0
for(i in 1:dim(indx)[2]){
datnw <- dat[indx[,i],]
dat <- datnw}
datnw2 <- dat[, -which(cs)]

Thanks

Upvotes: 1

Views: 15932

Answers (1)

MvG
MvG

Reputation: 60858

If I understand correctly what you're trying to achieve, you might best write it this way:

cs <- colSums(dat) < 10
dat[rowSums(dat[,cs]) == 0, !cs]

This means: for any column with sum less than 10 (called a “small column” hereafter), drop any row which has a 1 in that column. So you only keep rows which have a zero in all those small columns. You drop the small columns as well, as they would only contain zeros in any case.

In your code, indx is a logical data frame with 40 rows, one for each row of input, and one column for each small column in the input. You use the first column of idx to remove the rows with a 1 in the first short column. This results in a new value for dat, which is a few rows shorter than the original. In the next iteration of the loop, you use the second logical vector in an attempt to remove more rows. But this won't work: after the first iteration, dat has less than 40 rows, but the second column still has all 40 rows. This is what's causing the error: you're subscripting a vector of less than 40 elements with a logical vector of length 40.

You could combine the three columns of your indx into a single vector suitable to subscript the rows of interest using the following expression:

apply(indx, 1, all)

This will have a TRUE value in its result for exactly those rows which have TRUE in each column. However, I guess I'd prefer my code above over this, as it is much shorter to write. The most likely reason to prefer the latter is if your data frame may contain negative number, so that a row sum of zero does not imply an all-zero row. Not a problem in your example data.

Upvotes: 2

Related Questions