Yatrosin
Yatrosin

Reputation: 341

Using ifelse statement in R dataframe to generate additional variables

I've got a dataframe in R with the following possible combinations in the first two columns:

 V1| V2| V3| V4
---|---|---|---
 0 | 0 | NA| NA
---|---|---|---
 0 | 1 | NA| NA
---|---|---|---
 0 | 2 | NA| NA
---|---|---|---
 1 | 0 | NA| NA
---|---|---|---
 1 | 1 | NA| NA
---|---|---|---
 1 | 2 | NA| NA
---|---|---|---
 2 | 0 | NA| NA
---|---|---|---
 2 | 1 | NA| NA
---|---|---|---
 2 | 2 | NA| NA

I would like to generate two ifelse statements or one if possible, so that I can get these two additional columns based on the different combinations:

 V1| V2| V3| V4
---|---|---|---
 0 | 0 | 0 | AA
---|---|---|---
 0 | 1 | 1 | AD
---|---|---|---
 0 | 2 | 2 | DD
---|---|---|---
 1 | 0 | 0 | AB
---|---|---|---
 1 | 1 | NA| NA
---|---|---|---
 1 | 2 | 1 | CD
---|---|---|---
 2 | 0 | 0 | BB
---|---|---|---
 2 | 1 | 0 | BC
---|---|---|---
 2 | 2 | 0 | CC

I'm really stuck at this point and none of the options I have tried work.

If I try something like this:

DF$V3 <- if((DF$V1=2) & (DF$V2 = 2)) {DF$V3 = 0}

All values in V1 and V2 are converted into 2 and all values in V3 are converted to 0.

If I use the elseif command in the following way:

DF$V3 <- elseif((DF$V1=2) & (DF$V2 = 2)) {DF$V3 = 0}

I get Error: could not find function "elseif"

I have read several forums about nested if and elseif statements in R, but I'm not able to figure out how to get the results I want using two conditions on two different columns.

Can somebody suggest any options?

Thank you very much,

Best,

Yatrosin

Upvotes: 1

Views: 7908

Answers (4)

r2evans
r2evans

Reputation: 160407

Up front: I think the use of ifelse statements in this problem is strongly ill-advised. It requires significant nesting, sacrificing performance and readability. Though these two solutions may be a little harder if you aren't familiar with mapply or table-join-calculus, the payoff in stability and performance will far outweigh the time to learn these techniques.

Two methods:

Lookup matrix

One way is to define look-up arrays, where the row names reflect the possible V1 values, and the column names reflect the possible V2 values. (Note that when referencing these lookup matrices, one must use as.character if your values are numeric/integer, since otherwise they will look for the slice/row number, not the specific matching column/row.)

Examples:

dat <- data.frame(
  V1 = c(0,0,0,1,1,1,2,2,2),
  V2 = c(0,1,2,0,1,2,0,1,2)
)
dmnms <- list(c(0,1,2), c(0,1,2))
m3 <- matrix(c(0, 1, 2,
               0, NA, 1,
               0, 0, 0),
             nrow = 3, byrow = TRUE, dimnames = dmnms)
m4 <- matrix(c("AA", "AD", "DD",
               "AB", NA, "CD",
               "BB", "BC", "CC"),
             nrow = 3, byrow = TRUE, dimnames = dmnms)

m3
#   0  1 2
# 0 0  1 2
# 1 0 NA 1
# 2 0  0 0
m4
#   0    1    2   
# 0 "AA" "AD" "DD"
# 1 "AB" NA   "CD"
# 2 "BB" "BC" "CC"

in this case, notice the 0, 1, and 2 in the row/column margins. In a matrix with no names, these are typically [1,], [2,], etc, indicating that actual names are not available, instead reflecting just the row number. However, since these are character (no brackets/commas), they can be referenced directly, ala

m3["0","2"]
# [1] 2
m4["1","0"]
# [1] "AB"

From here, you just need to map these lookups into new columns, something like:

dat$V3 <- mapply(`[`, list(m3), as.character(dat$V1), as.character(dat$V2))
dat$V4 <- mapply(`[`, list(m4), as.character(dat$V1), as.character(dat$V2))
dat
#   V1 V2 V3   V4
# 1  0  0  0   AA
# 2  0  1  1   AD
# 3  0  2  2   DD
# 4  1  0  0   AB
# 5  1  1 NA <NA>
# 6  1  2  1   CD
# 7  2  0  0   BB
# 8  2  1  0   BC
# 9  2  2  0   CC

Joining data.frame

Another method is to join a known data.frame onto your data. This has an added benefit of easily expanding to more than two criteria. (Technically, the matrix method can expand to more than 2, in which case it would be an n-dim array, but it is often a little harder to edit, manage, and visualize.)

In your example, this doesn't initially gain you much, since you need to pre-define your data.frame, but I'm guessing that this is just representative data, and your conditional classification is on much more data.

I'll define the joiner data.frame that will be used against your actual data. This is the reference data, from which all input permutations will be defined into the respective V3 and V4 values.

joiner <- data.frame(
  V1 = c(0,0,0,1,1,1,2,2,2),
  V2 = c(0,1,2,0,1,2,0,1,2),
  V3 = c(0, 1, 2, 0, NA, 1, 0, 0, 0),
  V4 = c("AA", "AD", "DD", "AB", NA, "CD", "BB", "BC", "CC"),
  stringsAsFactors = FALSE
)

I'll create a sample second data to demonstrate the merge:

dat2 <- data.frame(
  V1 = c(2, 0, 1, 0),
  V2 = c(0, 1, 2, 2)
)
merge(dat2, joiner, by = c("V1", "V2"))
#   V1 V2 V3 V4
# 1  0  1  1 AD
# 2  0  2  2 DD
# 3  1  2  1 CD
# 4  2  0  0 BB

Edit: if you are concerned about dropping rows, then add all.x=TRUE to the merge command. If (as you saw based on your comment) you use all=TRUE, this is a full join in SQL parlance, meaning it will keep all rows from both tables, even if there is not a match made. This may be better explained by referencing this answer and noting that I'm suggesting a left join with all.x, keeping all on the left (first argument), only merging in rows on the right where a match is made.

(Note: this can also be done quite easily using dplyr and data.table packages.)

Upvotes: 3

Damian
Damian

Reputation: 1433

Building on the first half of the post by @r2evans, using named vectors to do multidimensional lookups instead of matrices. The logic is the same, but I find the syntax to be easier to interpret/debug.

dat <- data.frame(
    V1 = c(0, 0, 0, 1, 1, 1, 2, 2, 2),
    V2 = c(0, 1, 2, 0, 1, 2, 0, 1, 2)
) 

# Use a named vector to store key-value pairs:
#   keys: combinations of V1 and V2 in text form
#   values: the desired result associated with each combination

V3_lookup <- c("0,0" = 0,
               "0,1" = 1,
               "0,2" = 2,
               "1,0" = 0,
               "1,1" = NA,
               "1,2" = 1,
               "2,0" = 0,
               "2,1" = 0,
               "2,2" = 0)

V4_lookup <- c("0,0" = "AA",
               "0,1" = "AD",
               "0,2" = "DD",
               "1,0" = "AB",
               "1,1" =  NA,
               "1,2" = "CD",
               "2,0" = "BB",
               "2,1" = "BC",
               "2,2" = "CC")

# Create a character vector with the inputs 
# (the combinations of V1 and V2 that actually occur in the data)

id <- paste(dat$V1, dat$V2, sep = ",")
id
[1] "0,0" "0,1" "0,2" "1,0" "1,1" "1,2" "2,0" "2,1" "2,2"


# Map these strings to the desired outputs

dat$V3 <- V3_lookup[ id ]
dat$V4 <- V4_lookup[ id ]

# Result
dat

  V1 V2 V3   V4
1  0  0  0   AA
2  0  1  1   AD
3  0  2  2   DD
4  1  0  0   AB
5  1  1 NA <NA>
6  1  2  1   CD
7  2  0  0   BB
8  2  1  0   BC
9  2  2  0   CC

Upvotes: 2

B Williams
B Williams

Reputation: 2050

In your example you have a suite of potential outputs:

library(dplyr)
df <- data.frame(V1 = rep(0:2, each=3), V2 = rep(0:2, 3))

df %>% 
   mutate(V3 = ifelse(V1==0 & V2==1 | V1==1 & V2==2, 1, 
                  ifelse(V1==0 & V2==2, 2, 
                         ifelse(V1==1 & V2==1, NA,0))),
      V4 = ifelse(V1==0 & V2==0, 'AA',
                  ifelse(V1==0 & V2==1, 'AD',
                         ifelse(V1==0 & V2==2, 'DD',
                                ifelse(V1==1 & V2==0, 'AB',
                                       ifelse(is.na(V3), 'NA',
                                              ifelse(V1==1 & V2==2, 'CD',
                                                     ifelse(V1==2 & V2==0, 'BD',
                                                            ifelse(V1==2 & V2==1, 'BC','CC')))))))))

Without a more complete understanding of your data this is about the best I can come up with.

Upvotes: 1

Mako212
Mako212

Reputation: 7292

You can use an ifelse() statement with multiple conditions:

ifelse(a == 1 & b == 2, TRUE, FALSE)

In R you need to use == to check for equality.

For example:

df <- data.frame(
+ x = c(1,1,2,2),
+ y = c(3,3,4,4))


df$z <- ifelse(df$x==1 & df$y==3, "Yes","No")

df
  x y    z
1 1 3  Yes
2 1 3  Yes
3 2 4   No
4 2 4   No

Upvotes: 0

Related Questions