Reputation: 1068
I have a data.frame
where I want to delete rows for which the 5th column entries are equal to zero.
The data.frame
looks like this:
Column1 Column2 Column3 Column4 Column5 Column6
1 A 3 2 1 1
2 D 2 2 4 1
3 D 4 1 0 2
4 E 4 1 0 2
5 F 2 1 A 3
So in this case the 3rd and the 4th column should be deleted. My dataframe is called
dataframe and currently I use the following code:
for(i in 1:length(dataframe[,1])){
if (dataframe[i,5]==0) {
dataframe2<-dataframe[-i,]
}
}
The problem is that I have 162000 entries and my code takes a long time. So how can I get a fast implementation of this?
Upvotes: 0
Views: 63
Reputation: 226087
I think:
dataframe2 <- dataframe[dataframe[,5]!=0,]
or
dataframe2 <- dataframe[dataframe[,"Column5"]!=0,]
or
dataframe2 <- subset(dataframe, Column5 != 0)
As @dickoa suggests you can also index with $
:
dataframe2 <- dataframe[dataframe$Column5 != 0,]
In general:
[[]]
or [,]
is slightly more robust and general than indexing with $
(for example, you can use variable names constructed on the fly or numeric indices in [[]]
, and only exact names with $
subset
is the most readable, but less robust in some contexts For a problem of the size you're describing all of these approaches should be more or less instantaneous/indistinguishable in terms of speed.
Upvotes: 2
Reputation: 18437
You should read an introductory manual to understand basic subsetting
df <- df[df$Column5 != 0, ]
## Column1 Column2 Column3 Column4 Column5 Column6
## 1 1 A 3 2 1 1
## 2 2 D 2 2 4 1
## 5 5 F 2 1 A 3
hope it helps.
Upvotes: 1