Reputation: 145

Error Removing multiple columns in R

I am new to R and I am trying to simply remove many columns using a for loop

for (i in 15:ncol(DB)){
    BD[,i]<- NULL
}

But I keep on getting this error:

 Error in `[<-.data.frame`(`*tmp*`, , i, value = NULL) : 
 new columns would leave holes after existing columns

Could someone explain why this is happening? Thanks

Upvotes: 3

Answers (3)

Greg Snow

Reputation: 49640

Others have shown how to do what you want, I will focus on what the error message means and why your approach is not working.

Let's assume that your data frame has 20 columns. The first iteration through the loop will remove column number 15 and in the process will shift all the columns after 15 to fill in the gap, so what was column 16 is now in the position of column 15 and the data frame now has 19 columns.

The second iteration will now remove the column that is in the 16th position (which was originally column 17) and move the other over so that now there are 18 columns.

The third iteration will remove the column in the 17th position (which was originally column 19 having moved twice) and move column 20 down to the 17th position and the data frame now has 17 columns.

The 4th iteration will try to assign NULL to the 18th column, which does not exist, but is next to an existing column, so probably will not complain.

The 5th iteration will now try to assign to the 19th column, but with only 17 columns remaining in the data frame this would leave a gap (no 18th column) and hence the error.

This is probably not the result that you want since the 16th and 18 columns are still in the data frame, just in a different position. This is one of the reasons that you need to be careful when modifying any object in a loop. For the simple deleting, the other answers show better approaches. But if you want to use a loop because you are only going to conditionally delete, then it is still possibly, you just need to work backwards (right to left, high to low) by using ncol(DB):15 instead of 15:ncol(DB). This starts with the last column and moves down, that way any columns that are shifted are the ones that have already been tested and processed.

Upvotes: 6

Stefan Avey

Reputation: 1188

While I'm not sure that it is good style, you can also use negative indices as shorthand for indices to exclude and I do this often.

mydf <- data.frame(matrix(1:20, ncol = 10))
mydf
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1  1  3  5  7  9 11 13 15 17  19
# 2  2  4  6  8 10 12 14 16 18  20
mydf[,-(4:7)]  ## columns 4 through 7 are excluded
# X1 X2 X3 X8 X9 X10
# 1  1  3  5 15 17  19
# 2  2  4  6 16 18  20

Be careful with the order of operations if you do use negative indices because : has precedence over - and thus -4:7 gives

mydf[,-4:7]
# Error in .subset(x, j) : only 0's may be mixed with negative subscripts

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

A for loop is not required to do this. Just use list(NULL) to (destructively) drop the columns you want to drop.

Example:

mydf <- data.frame(matrix(1:20, ncol = 10))
mydf
#   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1  1  3  5  7  9 11 13 15 17  19
# 2  2  4  6  8 10 12 14 16 18  20
mydf[4:7] <- list(NULL)
mydf
#   X1 X2 X3 X8 X9 X10
# 1  1  3  5 15 17  19
# 2  2  4  6 16 18  20

Upvotes: 2

Error Removing multiple columns in R

Answers (3)

Related Questions