Reputation: 3
Hi Stackoverflow comunity. I research electricity price dynamics and have a question regarding filtering or omiting values from a large data.frame.
My data.frame looks like this and originally has 15 variables:
time_stamp price; renw_elec; wday;
01.01.2014; 12.5; 25,562.25; 3;
02.01.2014; 14.5; 23,896.56; 4;
03.01.2014; 17.6; 26,634.87; 5;
04.01.2014; 12.9; 30,214,56; 6;
05.01.2014; 10.5; 21,256.56; 0;
06.01.2014; 20.4; 28,985.78; 1;
07.01.2014; 22.7; 32,578.98; 2;
What I was trying to do is to filter the data.frame depending on values in the variable wday. For instance, omitting all rows in the data.frame for values 0 and 1 in the variable wday, to make it look like this:
time_stamp price; renw_elec; wday;
01.01.2014; 12.5; 25,562.25; 3;
02.01.2014; 14.5; 23,896.56; 4;
03.01.2014; 17.6; 26,634.87; 5;
04.01.2014; 12.9; 30,214,56; 6;
07.01.2014; 22.7; 32,578.98; 2;
I did try to do it with df$wday[is.na(df$wday)]<-0
, as described on cran, but it did not work at all. What do i do wrong, or how to solve such a problem the best way?
Thank you for your help in advance! :)
Upvotes: 0
Views: 78
Reputation: 31161
It is basic filtering on a data.frame:
df[df$wday!=0 & df$wday!=1,]
or
df[df$wday>1,]
or
vec = c(0,1)
df[!(df$wday %in% vec),]
Upvotes: 1
Reputation: 3622
Using dplyr
, you can also do:
library(dplyr)
df %>% filter(wday > 1)
time_stamp price renw_elec wday
1 01.01.2014 12.5 25562.25 3
2 02.01.2014 14.5 23896.56 4
3 03.01.2014 17.6 26634.87 5
4 04.01.2014 12.9 30214.56 6
5 07.01.2014 22.7 32578.98 2
Upvotes: 1