Reputation: 1
I have a dataset like the one below.
Data is evaluated every minute by sensor.
WEIGHT
is a dependent variable. And TIME
means hour/minute. This data will have accumulated for years.
The problem is row[4]
. At this row, weight has a strange value (it is out of
range), which occured by Error of sensor. You must remind that Anyone can't expect when the strange value will be occured.
What I want is making a procedure performing like below. 1. using a method, set the range of variance(set range as from 10 to 50) 2. using for(i) statement, check whether variance(weight) is in the range. 3. when variance is out of range, impute weight[i] as NA.
ID TIME WEIGHT
HM001 1223 24.9
HM001 1224 25.2
HM001 1225 25.5
HM001 1226 12233
HM001 1227 25.7
HM001 1228 27.1
Upvotes: 0
Views: 1786
Reputation: 21502
Because I couldn't resist:
fooweight<-runif(1e6)
wfun1<-function(x) x[x<.1 | x>.5] <- NA
wfun2<-function(x) is.na(x) <- (x < .10 | x > .50)
microbenchmark(wfun1(fooweight),wfun2(fooweight),times=100)
Unit: milliseconds
expr min lq median uq max
1 wfun1(fooweight) 45.00671 47.68492 49.27120 50.28852 152.4313
2 wfun2(fooweight) 47.74992 51.05204 51.89938 53.00629 156.0306
Sorry, Sven, you lose to juba by about 5% :-)
Upvotes: 2
Reputation: 81693
You could use within
and is.na<-
for this problem. Assuming your data frame is called dat
:
within(dat, is.na(WEIGHT) <- WEIGHT < 10 | WEIGHT > 50)
ID TIME WEIGHT
1 HM001 1223 24.9
2 HM001 1224 25.2
3 HM001 1225 25.5
4 HM001 1226 NA
5 HM001 1227 25.7
6 HM001 1228 27.1
Upvotes: 3
Reputation: 49033
If your data is in a data frame called d
, you can use :
d$WEIGHT[d$WEIGHT<10 | d$WEIGHT>50] <- NA
You souldn't use for
loops but vector indexing for this kind of task.
Upvotes: 4