Jen Bohold
Jen Bohold

Reputation: 1068

How to speed up this code deleting certain rows?

I have a data.frame where I want to delete rows for which the 5th column entries are equal to zero.

The data.frame looks like this:

Column1 Column2 Column3 Column4 Column5 Column6
1       A         3       2       1       1
2       D         2       2       4       1
3       D         4       1       0       2
4       E         4       1       0       2
5       F         2       1       A       3

So in this case the 3rd and the 4th column should be deleted. My dataframe is called

dataframe and currently I use the following code:

for(i in 1:length(dataframe[,1])){ 
  if (dataframe[i,5]==0) {
    dataframe2<-dataframe[-i,] 
  } 
}

The problem is that I have 162000 entries and my code takes a long time. So how can I get a fast implementation of this?

Upvotes: 0

Views: 63

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226087

I think:

dataframe2 <- dataframe[dataframe[,5]!=0,] 

or

dataframe2 <- dataframe[dataframe[,"Column5"]!=0,] 

or

dataframe2 <- subset(dataframe, Column5 != 0)

As @dickoa suggests you can also index with $:

dataframe2 <- dataframe[dataframe$Column5 != 0,] 

In general:

  • indexing by column name is more robust and often more readable than indexing by number (although in your example the column names aren't meaningful)
  • indexing with [[]] or [,] is slightly more robust and general than indexing with $ (for example, you can use variable names constructed on the fly or numeric indices in [[]], and only exact names with $
  • subset is the most readable, but less robust in some contexts

For a problem of the size you're describing all of these approaches should be more or less instantaneous/indistinguishable in terms of speed.

Upvotes: 2

dickoa
dickoa

Reputation: 18437

You should read an introductory manual to understand basic subsetting

df <- df[df$Column5 != 0, ]
##   Column1 Column2 Column3 Column4 Column5 Column6
## 1       1       A       3       2       1       1
## 2       2       D       2       2       4       1
## 5       5       F       2       1       A       3

hope it helps.

Upvotes: 1

Related Questions