TKM
TKM

Reputation: 3

Missing values imputation for some records in a dataframe

This is my dataframe

Age<-c(10, 20, 15, NA, 34, NA, 40, NA, 50, NA)
Salary<-c(100,120, 113,140,150, 160, 170, 180, 190, 200 )
dat<-data.frame(Age, Salary)

I want to impute missing values of Age with value 12 only when Salary < 150 and I want to impute the missing values of Age with value 30 only when Salary >150. I have been trying to do this using dplyr but unable to find a way as I am new to R. How would i write this query in R? Thanks

Upvotes: 0

Views: 49

Answers (2)

COLO
COLO

Reputation: 1114

Using data.table:

library(data.table) 
dat <- data.table(dat)
dat[ is.na(Age) & Salary <150, Age:=12,]
dat[ is.na(Age) & Salary >150, Age:=30,]

> dat
     Age Salary
 1:  10    100
 2:  20    120
 3:  15    113
 4:  12    140
 5:  34    150
 6:  30    160
 7:  40    170
 8:  30    180
 9:  50    190
10:  30    200

It is not a "oneliner" solution, but is easy to understand if you are a beginner with R.

Upvotes: 1

Bea
Bea

Reputation: 1110

This could be an option:

dat$Age[which(is.na(dat$Age))] = ifelse(dat$Salary[which(is.na(dat$Age))]<150,12,30)

Upvotes: 0

Related Questions