Reputation: 77
I have a data
as follows:
data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
Wt=c(91,92,85,205,285,43,95,75,76,NA),
Ht=c(185,182,173,171,600,650,NA,890,NA,NA))
Wt
represents the weight in kilograms and Ht
represents the height in centimeters. In this example, I want to treat the values of Wt
bigger than 200 as outliers and change to some specific numbers.
Also, I want to treat the values of Ht
bigger than 250 as outliers and change to NA
.
In my actual data
, there are few outliers in Wt
and many outliers in Ht
.
So, I could find the outliers for Wt
by using the code below:
a1<-data$Wt
a1<-data.frame(a1)
a1<-na.omit(a1)
b1<-a1[a1$a1>200, ]
b1 #205,285
I want to change 205 to 80 and change 285 to 90. (Because, in my actual data, there are few outliers for Wt
, so that I can change them individually.)
Also, I want to make the values of Ht
bigger than 250 as NA
. So my expected output is as follows:
data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
Wt=c(91,92,85,80,90,43,95,75,76,NA),
Ht=c(185,182,173,171,NA,NA,NA,NA,NA,NA))
Upvotes: 2
Views: 181
Reputation: 1075
Do it by reference using data.table
:
library(data.table)
setDT(data)
data[Ht > 250, Ht := NA]
data[Wt == 205, Wt := 80]
data[Wt == 285, Wt := 90]
data
id Wt Ht
1: 1 91 185
2: 2 92 182
3: 3 85 173
4: 4 80 171
5: 5 90 NA
6: 6 43 NA
7: 7 95 NA
8: 8 75 NA
9: 9 76 NA
10: 10 NA NA
For more info, see: Introduction to data.table
.
To implement the ifelse
method suggested by Shawn in data.table
, you could do something like the following. This is still updating by reference, so there is still no need to write something like data <- data %>% ...
.
library(data.table)
setDT(data)
data[, `:=`(Ht = fifelse(Ht > 250, NA_real_, Ht),
Wt = fcase(Wt == 205, 80,
Wt == 285, 90,
rep(TRUE, .N), Wt))]
Note that fifelse
is data.table
's fast ifelse
, and I use fcase
to handle the two Wt
conditions together (although the rep(TRUE, .N)
trick is a bit hacky, so maybe it would be nicer to just use two fifelse
calls for Wt
instead of fcase
).
Update 2023-07-17: If you need a vectorised option, I suggest using plyr::mapvalues
. Even though plyr
is retired, you can just use body(plyr::mapvalues)
and do the same thing for it's few dependencies, and hence put the code into your own project.
Upvotes: 3
Reputation: 3228
The above answer is useful. I also wanted to add an alternative answer in case you may find it helpful to learn other functions. You can plug in any values you want with ifelse
and the functionality of the tidyverse
. As an example, I use mutate
here to create variables and ifelse
to simply transform the values you wanted. Below is essentially just your data and two functions combined into one command:
library(tidyverse)
data %>%
mutate(Wt = ifelse(Wt > 200,
"9999",
Wt),
Ht = ifelse(Ht > 250,
"NA",
Ht))
Annotated below is what I am doing with the code:
library(tidyverse) # load this library for %>% and mutate
data %>% # use this data
mutate(Wt = ifelse(Wt > 200, # take Wt over 200
"9999", # replace with this value
Wt), # otherwise use the original Wt value
Ht = ifelse(Ht > 250, # take Ht over 250
"NA", # replace with this value
Ht)) # otherwise use the original Ht value
Which should give you any desired output depending on how you use it:
id Wt Ht
1 1 91 185
2 2 92 182
3 3 85 173
4 4 9999 171
5 5 9999 NA
6 6 43 NA
7 7 95 <NA>
8 8 75 NA
9 9 76 <NA>
10 10 <NA> <NA>
Try it out and lemme know what you think!
Upvotes: 1