Lee922
Lee922

Reputation: 77

Replace certain values in data.frame columns

I have a data as follows:

data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
                 Wt=c(91,92,85,205,285,43,95,75,76,NA),
                 Ht=c(185,182,173,171,600,650,NA,890,NA,NA))

Wt represents the weight in kilograms and Ht represents the height in centimeters. In this example, I want to treat the values of Wt bigger than 200 as outliers and change to some specific numbers. Also, I want to treat the values of Ht bigger than 250 as outliers and change to NA. In my actual data, there are few outliers in Wt and many outliers in Ht. So, I could find the outliers for Wt by using the code below:

a1<-data$Wt 

a1<-data.frame(a1)
a1<-na.omit(a1)
b1<-a1[a1$a1>200, ]
b1  #205,285

I want to change 205 to 80 and change 285 to 90. (Because, in my actual data, there are few outliers for Wt, so that I can change them individually.) Also, I want to make the values of Ht bigger than 250 as NA. So my expected output is as follows:

data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
                 Wt=c(91,92,85,80,90,43,95,75,76,NA),
                 Ht=c(185,182,173,171,NA,NA,NA,NA,NA,NA))

Upvotes: 2

Views: 181

Answers (2)

diomedesdata
diomedesdata

Reputation: 1075

Do it by reference using data.table:

library(data.table)
setDT(data)

data[Ht > 250, Ht := NA]
data[Wt == 205, Wt := 80]
data[Wt == 285, Wt := 90]
data
    id Wt  Ht
 1:  1 91 185
 2:  2 92 182
 3:  3 85 173
 4:  4 80 171
 5:  5 90  NA
 6:  6 43  NA
 7:  7 95  NA
 8:  8 75  NA
 9:  9 76  NA
10: 10 NA  NA

For more info, see: Introduction to data.table.


To implement the ifelse method suggested by Shawn in data.table, you could do something like the following. This is still updating by reference, so there is still no need to write something like data <- data %>% ....

library(data.table)
setDT(data)

data[, `:=`(Ht = fifelse(Ht > 250, NA_real_, Ht),
            Wt = fcase(Wt == 205, 80, 
                       Wt == 285, 90,
                       rep(TRUE, .N), Wt))]

Note that fifelse is data.table's fast ifelse, and I use fcase to handle the two Wt conditions together (although the rep(TRUE, .N) trick is a bit hacky, so maybe it would be nicer to just use two fifelse calls for Wt instead of fcase).


Update 2023-07-17: If you need a vectorised option, I suggest using plyr::mapvalues. Even though plyr is retired, you can just use body(plyr::mapvalues) and do the same thing for it's few dependencies, and hence put the code into your own project.

Upvotes: 3

Shawn Hemelstrand
Shawn Hemelstrand

Reputation: 3228

The above answer is useful. I also wanted to add an alternative answer in case you may find it helpful to learn other functions. You can plug in any values you want with ifelse and the functionality of the tidyverse. As an example, I use mutate here to create variables and ifelse to simply transform the values you wanted. Below is essentially just your data and two functions combined into one command:

library(tidyverse)

data %>%
  mutate(Wt = ifelse(Wt > 200,
                     "9999",
                     Wt),
         Ht = ifelse(Ht > 250,
                     "NA",
                     Ht))

Annotated below is what I am doing with the code:

library(tidyverse) # load this library for %>% and mutate

data %>% # use this data 
  mutate(Wt = ifelse(Wt > 200, # take Wt over 200
                     "9999", # replace with this value
                     Wt), # otherwise use the original Wt value
         Ht = ifelse(Ht > 250, # take Ht over 250
                     "NA", # replace with this value
                     Ht)) # otherwise use the original Ht value

Which should give you any desired output depending on how you use it:

   id   Wt   Ht
1   1   91  185
2   2   92  182
3   3   85  173
4   4 9999  171
5   5 9999   NA
6   6   43   NA
7   7   95 <NA>
8   8   75   NA
9   9   76 <NA>
10 10 <NA> <NA>

Try it out and lemme know what you think!

Upvotes: 1

Related Questions