Reputation: 15458

generate column values with multiple conditions in R

I have a dataframe z and I want to create the new column based on the values of two old columns of z. Following is the process:

>z<-cbind(x=1:10,y=11:20,t=21:30)
> z<-as.data.frame(z)
>z
    x  y  t
1   1 11 21
2   2 12 22
3   3 13 23
4   4 14 24
5   5 15 25
6   6 16 26
7   7 17 27
8   8 18 28
9   9 19 29
10 10 20 30

# generate the column q which is equal to the values of column t times 4 if x=3 and for other values of x, it is equal to the values of column t.

for (i in 1:nrow(z)){
  z$q[i]=if (z$x[i]==4) 4*z$t[i] else z$t[i]}

But, my problem is that I want to apply multiple conditions:

For example, I want to get something like this:

(If x=2, q=t*2; x=4, q=t*4; x=7, q=t*3; for other it is equal to t) 

> z
   x  y  t  q
1   1 11 21 21
2   2 12 22 44
3   3 13 23 23
4   4 14 24 96
5   5 15 25 25
6   6 16 26 26
7   7 17 27 81
8   8 18 28 28
9   9 19 29 29
10 10 20 30 30

How do I get the second output using the loops or any other method?

Upvotes: 5

Answers (8)

Tommy O'Dell

Reputation: 7109

Here's a version of an SQL decode in R for character vectors (untested with factors) that operates just like the SQL version. i.e. it takes an arbitrary number of target/replacement pairs, and optional last argument that acts as a default value (note that the default won't overwrite NAs).

I can see it being pretty useful in conjunction with dplyr's mutate operation.

> x <- c("apple","apple","orange","pear","pear",NA)

> decode(x, apple, banana)
[1] "banana" "banana" "orange" "pear"   "pear"   NA      

> decode(x, apple, banana, fruit)
[1] "banana" "banana" "fruit"  "fruit"  "fruit"  NA      

> decode(x, apple, banana, pear, passionfruit)
[1] "banana"       "banana"       "orange"       "passionfruit" "passionfruit" NA            

> decode(x, apple, banana, pear, passionfruit, fruit)
[1] "banana"       "banana"       "fruit"        "passionfruit" "passionfruit" NA

Here's the code I'm using, with a gist I'll keep up to date here (link).

decode <- function(x, ...) {

  args <- as.character((eval(substitute(alist(...))))

  replacements <- args[1:length(args) %% 2 == 0]
  targets      <- args[1:length(args) %% 2 == 1][1:length(replacements)]

  if(length(args) %% 2 == 1)
    x[! x %in% targets & ! is.na(x)] <- tail(args,1)

  for(i in 1:length(targets))
    x <- ifelse(x == targets[i], replacements[i], x)

  return(x)

}

Upvotes: 1

malcook

Reputation: 1733

You can do it in

base R
with one line
in which the mapping is pretty clear to read in the code
no helper functions (ok, an anonymous function)
approach works with negatives
approach works with any atomic vector (reals, characters)

like this:

> transform(z,q=t*sapply(as.character(x),function(x) switch(x,"2"=2,"4"=4,"7"=3,1)))
    x  y  t  q
1   1 11 21 21
2   2 12 22 44
3   3 13 23 23
4   4 14 24 96
5   5 15 25 25
6   6 16 26 26
7   7 17 27 81
8   8 18 28 28
9   9 19 29 29
10 10 20 30 30

Upvotes: 2

Carl Witthoft

Reputation: 21532

I really liked the answer "dinre" posted to flodel's blog:

for (i in 1:length(data_Array)){
data_Array[i] <- switch(data_Array[i], banana="apple", orange="pineapple", "fig")
}

With warnings about reading the help page for switch carefully for integer arguments.

Upvotes: 2

Sven Hohenstein

Reputation: 81733

Here is an easy solution with just one ifelse command:

Calculate the multiplier of t:

ifelse(z$x == 7, 3, z$x ^ (z$x %in% c(2, 4)))

The complete command:

transform(z, q = t * ifelse(x == 7, 3, x ^ (x %in% c(2, 4))))

    x  y  t  q
1   1 11 21 21
2   2 12 22 44
3   3 13 23 23
4   4 14 24 96
5   5 15 25 25
6   6 16 26 26
7   7 17 27 81
8   8 18 28 28
9   9 19 29 29
10 10 20 30 30

Upvotes: 3

Jan Oosting

Reputation: 487

You can also use match to do this. I tend to use this a lot while assigning parameters like col, pch and cex to points in scatterplots

searchfor<-c(2,4,7)
replacewith<-c(2,4,3)

# generate multiplier column
# q could also be an existing vector where you want to replace certain entries
q<-rep(1,nrow(z))
#
id<-match(z$x,searchfor)
id<-replacewith[id]
# Apply the matches to q
q[!is.na(id)]<-id[!is.na(id)]
# apply to t
z$q<-q*z$t

Upvotes: 1

flodel

Reputation: 89097

By building a nested ifelse functional by recursion, you can get the benefits of both solutions offered so far: ifelse is fast and can work with any type of data, while @Matthew's solution is more functional yet limited to integers and potentially slow.

decode <- function(x, search, replace, default = NULL) {

   # build a nested ifelse function by recursion
   decode.fun <- function(search, replace, default = NULL)
      if (length(search) == 0) {
         function(x) if (is.null(default)) x else rep(default, length(x))
      } else {
         function(x) ifelse(x == search[1], replace[1],
                                            decode.fun(tail(search, -1),
                                                       tail(replace, -1),
                                                       default)(x))
      }

   return(decode.fun(search, replace, default)(x))
}

Note how the decode function is named after the SQL function. I wish a function like this made it to the base R package... Here are a couple examples illustrating its usage:

decode(x = 1:5, search = 3, replace = -1)
# [1]  1  2 -1  4  5
decode(x = 1:5, search = c(2, 4), replace = c(20, 40), default = 3)
# [1] 3 20  3  40  3

For your particular problem:

transform(z, q = decode(x, search = c(2,4,7), replace = c(2,4,3), default = 1) * t)

#    x  y  t  q
# 1   1 11 21 21
# 2   2 12 22 44
# 3   3 13 23 23
# 4   4 14 24 96
# 5   5 15 25 25
# 6   6 16 26 26
# 7   7 17 27 81
# 8   8 18 28 28
# 9   9 19 29 29
# 10 10 20 30 30

Upvotes: 10

Metrics

Reputation: 15458

Based on the suggestion of Señor :

> z$q <- ifelse(z$x == 2, z$t * 2,
         ifelse(z$x == 4, z$t * 4,
         ifelse(z$x == 7, z$t * 3,
                          z$t * 1)))
> z
    x  y  t  q
1   1 11 21 21
2   2 12 22 44
3   3 13 23 23
4   4 14 24 96
5   5 15 25 25
6   6 16 26 26
7   7 17 27 81
8   8 18 28 28
9   9 19 29 29
10 10 20 30 30

Upvotes: 3

Matthew Lundberg

Reputation: 42689

Generate a multipler vector:

tt <- rep(1, max(z$x))
tt[2] <- 2
tt[4] <- 4
tt[7] <- 3

And here is your new column:

> z$t * tt[z$x]
 [1] 21 44 23 96 25 26 81 28 29 30

> z$q <- z$t * tt[z$x]
> z
    x  y  t  q
1   1 11 21 21
2   2 12 22 44
3   3 13 23 23
4   4 14 24 96
5   5 15 25 25
6   6 16 26 26
7   7 17 27 81
8   8 18 28 28
9   9 19 29 29
10 10 20 30 30

This will not work if there are negative values in z$x.

Edited

Here is a generalization of the above, where a function is used to generate the multiplier vector. In fact, we create a function based on parameters.

We want to transform the following values:

2 -> 2
4 -> 4
7 -> 3

Otherwise a default of 1 is taken.

Here is a function which generates the desired function:

f <- function(default, x, y) {
  x.min <- min(x)
  x.max <- max(x)
  y.vals <- rep(default, x.max-x.min+1)
  y.vals[x-x.min+1] <- y

  function(z) {
    result <- rep(default, length(z))
    tmp <- z>=x.min & z<=x.max
    result[tmp] <- y.vals[z[tmp]-x.min+1]
    result
  }
}

Here is how we use it:

x <- c(2,4,7)
y <- c(2,4,3)

g <- f(1, x, y)

g is the function that we want. It should be clear that any mapping can be supplied via the x and y parameters to f.

g(z$x)
## [1] 1 2 1 4 1 1 3 1 1 1

g(z$x)*z$t
## [1] 21 44 23 96 25 26 81 28 29 30

It should be clear this only works for integer values.

Upvotes: 3

generate column values with multiple conditions in R

Answers (8)

Related Questions