Reputation: 1764
Hi I have a dataset with multiple columns that are populated with either NA or "Y". I wish to make these values 0 and 1 respectively.
I am fairly new to R, and trying to determine the best way to loop through these variables and recode them.
STATE<-c(NA, "WA", "NY", NA, NA)
x<-c(NA,"Y",NA,NA,"Y")
y<-c(NA,NA,"Y",NA,"Y")
z<-c("Y","Y",NA, NA, NA)
mydata<-data.frame(x,y,z)
I have a large dataset, and many of these variables. However, some of them (such as STATE), I wish to leave alone. Any help would be greatly appreciated. Thanks.
Upvotes: 0
Views: 2820
Reputation: 141
The best way I think is to use the mutate_each()
function from the package dplyr
:
library(dplyr)
STATE <- c(NA, "WA", "NY", NA, NA)
x <- c(NA, "Y", NA, NA, "Y")
y <- c(NA, NA, "Y", NA, "Y")
z <- c("Y", "Y", NA, NA, NA)
mydata <- data.frame(x, y, z, STATE)
mydata <- mutate_each(mydata, funs(ifelse(is.na(.), 0, 1)), -STATE)
It will apply the function specified inside funs()
to each variable. The dot .
is a representation for the variable. To skip one or more variables just write their names with a -
before them: -var1, -var2, ...
Upvotes: 1
Reputation: 421
First, you need to make sure the character vectors are not coded as factors:
mydata <- data.frame(x,y,z, stringsAsFactors=F)
Then:
mydata[mydata=="Y"] <- 1
mydata[is.na(mydata)] <- 0
mydata
x y z
1 0 0 1
2 1 0 1
3 0 1 0
4 0 0 0
5 1 1 0
Upvotes: 0
Reputation: 2136
You can use ifelse
:
ifelse(is.na(mydata),0,ifelse(mydata=="Y",1,mydata)
This replaces elements of mydata to 0 if they are NA, to one if they are "Y" or keep element if they are anything else.
You added the binary tag. R has a binary type: TRUE/FALSE, so if you want binary, you should use
ifelse(is.na(mydata),FALSE,ifelse(mydata=="Y",TRUE,mydata)
instead.
Upvotes: 2