Waldir Leoncio
Waldir Leoncio

Reputation: 11341

Define factors whose levels depend on another variable

Be this mock data:

set.seed(20120220)
x  <- c(rep("a", 4), rep("b", 4))
y  <- c(sample(c(1, 2), 8, replace = TRUE))
z  <- data.frame(cbind(x, y))

Data frame z will look like this:

  x y
1 a 1
2 a 1
3 a 1
4 a 2
5 b 2
6 b 1
7 b 2
8 b 2

I want to run something akin to factor(z$y, levels = 1:2, labels = c("alpha", "beta")), but I don't want every 1 to become alpha and every 2 to become beta. I want that to happen only for x = a. If x = b, I want 1 to become gamma and 2 to become delta.

In other words, I want my data frame to look like this:

  x y
1 a alpha
2 a alpha
3 a alpha
4 a beta
5 b delta
6 b gamma
7 b delta
8 b delta

This is what I came up with so far:

for (i in 1:nrow(z)) {
  if (z$x[i] == "a") 
    z$y[i] <- factor(z$y[i], levels = 1:2, labels = c("alpha", "beta"))
  else
    z$y[i] <- factor(z$y[i], levels = 1:2, labels = c("gamma", "delta"))
}

But it gives me several warning messages (one for each i) like this:

Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = c(NA, 1L, 1L, 2L, 2L, 1L, 2L,  :
  invalid factor level, NAs generated

And then, when I call z again, the data frame is a mess, every y has been made into <NA>.

I bet there's a simple solution for this, but I've been trying several approaches for hours to no avail. My head is about to explode! Help!

Upvotes: 0

Views: 546

Answers (4)

Waldir Leoncio
Waldir Leoncio

Reputation: 11341

I've managed to come up with a solution that works, even though it is quite messy.

First, create subsets of the data frame z for each x

z1 <- subset(z, x == "a")
z2 <- subset(z, x == "b")

Then, apply factor() to each subset:

z1$y <- factor(z1$y, levels = 1:2, labels = c("alpha", "beta"))
z2$y <- factor(z2$y, levels = 1:2, labels = c("gamma", "delta"))

And finally, reunite the subsets into the original object.

z <- rbind(z1, z2)

Upvotes: 0

Jennie Lavine
Jennie Lavine

Reputation: 11

Here's one additional step to make the previous answer even a bit quicker - you can use 'unique' to pull out all the unique combinations in a data frame.

auxDf=unique(z)
auxDf$newy=c('alpha','beta','gamma','delta')

Then, as in the previous post

newDf <- merge(z,auxDf) 
newDf

Upvotes: 1

digEmAll
digEmAll

Reputation: 57210

What about using merge ?

# define x and y   to   'alpha', 'beta' etc.   correspondences 
# (it's just one row for each possible factor)
auxDf <- data.frame( x  = c('a',     'a',    'b',     'b'    ),
                     y  = c( 1,       2,      1,       2     ),
                    newy= c('alpha', 'beta', 'gamma', 'delta'))

# merge the 2 data.frame getting a new data.frame with the factors column
newDf <- merge(z,auxDf) 
newDf

Upvotes: 1

IRTFM
IRTFM

Reputation: 263301

> z$ynew <- ifelse(z$x == "a", ifelse( z$y==1, "alpha", "beta"),
                                ifelse(z$y==1, "delta", "gamma") )
> z
  x y  ynew
1 a 1 alpha
2 a 1 alpha
3 a 1 alpha
4 a 2  beta
5 b 2 gamma
6 b 1 delta
7 b 2 gamma
8 b 2 gamma

(I guess I swapped your delta's and gamma's. If you want 'ynew' to be a factor then just: z$ynew <- factor(z$ynew)

Upvotes: 1

Related Questions