kyg
kyg

Reputation: 105

How to recode variables in R

I am trying to recode variables in an R dataframe. Example - variable X from my dataset contains 1's and 0's. I want to create another variables Y which recodes 1's & 0's from X into Yes & No respectively.

I tried this to create the recoded Y variable:

w <- as.character()

for (i in seq_along(x))  {
    if (x[i] == 1)  {
        recode <- "Yes"
    } else if (x[i] == 0)  {
        recode <- "No"       
    }
    w <- cbind(w, recode)
}

Then I did this to line-up X and Y together:

y <- c(x, y)

What I got back was this:

 y
 # [1] "1"   "1"   "0"   "1"   "0"   "0"   "1"   "1"   "0"   "1"   "0"   "0"   "Yes" "Yes" "No"  "Yes" "No"  "No" 

I was expecting a dataframe with X & Y columns.

Question:

  1. How do I get X and Y into a dataframe?
  2. Is there a better way for recoding variables in a dataframe?

Upvotes: 0

Views: 2889

Answers (3)

arvi1000
arvi1000

Reputation: 9582

Recoding is generally about applying new labels to the levels of a factor (categorical variable)

In R, you do that like this:

w <- factor(x, levels = c(1,0), labels = c('yes', 'no'))

Upvotes: 3

Phil
Phil

Reputation: 4444

Using the following data:

x  <- c(rep.int(0, 10), rep.int(1, 10))
df <- as.data.frame(x)
df
#    x
# 1  0
# 2  0
# 3  0
# ...

I'd create a new variable and recode in one step:

df$y[df$x == 1] <- "yes"
df$y[df$x == 0] <- "no"
df
#    x   y
# 1  0  no
# 2  0  no
# 3  0  no
# ...
# 11 1 yes
# 12 1 yes
# 13 1 yes
# ...

Note for loops are not optimum in R, but your loop is basically correct. You need to replace w <- rbind(w, recode) with w <- cbind(w, recode) in the loop itself and, in the final step, you can cbind x and w:

w <- as.character()
for (i in seq_along(x))  {
  if (x[i] == 1)  {
    recode <- "Yes"
  } else if (x[i] == 0)  {
    recode <- "No"       
  }
  w <- rbind(w, recode)
}
y <- c(x, w)
y

rbind() appends rows, cbind() appends columns, and c() joins two strings together which is why you were getting two lists joined together into one.

Upvotes: 1

Konrad Rudolph
Konrad Rudolph

Reputation: 545568

This is one of the many cases where you really shouldn’t use a loop in R.

Instead, use vectorisation, i.e. ifelse or indexing.

result = data.frame(x = x, y = ifelse(x == 1, 'yes', 'no'))

(This assumes that there are only 1s and 0s in the input; if that isn’t the case, you need a nested ifelse or a list containing the translations).

Upvotes: 1

Related Questions