user2016517
user2016517

Reputation: 1

set out-of-range values to NA

I have a dataset like the one below. Data is evaluated every minute by sensor. WEIGHT is a dependent variable. And TIME means hour/minute. This data will have accumulated for years. The problem is row[4]. At this row, weight has a strange value (it is out of range), which occured by Error of sensor. You must remind that Anyone can't expect when the strange value will be occured.

What I want is making a procedure performing like below. 1. using a method, set the range of variance(set range as from 10 to 50) 2. using for(i) statement, check whether variance(weight) is in the range. 3. when variance is out of range, impute weight[i] as NA.

 ID      TIME   WEIGHT
HM001   1223    24.9
HM001   1224    25.2
HM001   1225    25.5
HM001   1226    12233
HM001   1227    25.7
HM001   1228    27.1

Upvotes: 0

Views: 1786

Answers (3)

Carl Witthoft
Carl Witthoft

Reputation: 21502

Because I couldn't resist:

fooweight<-runif(1e6)
wfun1<-function(x) x[x<.1 | x>.5] <- NA
wfun2<-function(x)  is.na(x) <- (x < .10 | x > .50)
microbenchmark(wfun1(fooweight),wfun2(fooweight),times=100)

Unit: milliseconds
              expr      min       lq   median       uq      max
1 wfun1(fooweight) 45.00671 47.68492 49.27120 50.28852 152.4313
2 wfun2(fooweight) 47.74992 51.05204 51.89938 53.00629 156.0306

Sorry, Sven, you lose to juba by about 5% :-)

Upvotes: 2

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

You could use within and is.na<- for this problem. Assuming your data frame is called dat:

within(dat, is.na(WEIGHT) <- WEIGHT < 10 | WEIGHT > 50)

     ID TIME WEIGHT
1 HM001 1223   24.9
2 HM001 1224   25.2
3 HM001 1225   25.5
4 HM001 1226     NA
5 HM001 1227   25.7
6 HM001 1228   27.1

Upvotes: 3

juba
juba

Reputation: 49033

If your data is in a data frame called d, you can use :

d$WEIGHT[d$WEIGHT<10 | d$WEIGHT>50] <- NA

You souldn't use for loops but vector indexing for this kind of task.

Upvotes: 4

Related Questions