Reputation: 341
I've got a dataframe in R with the following possible combinations in the first two columns:
V1| V2| V3| V4
---|---|---|---
0 | 0 | NA| NA
---|---|---|---
0 | 1 | NA| NA
---|---|---|---
0 | 2 | NA| NA
---|---|---|---
1 | 0 | NA| NA
---|---|---|---
1 | 1 | NA| NA
---|---|---|---
1 | 2 | NA| NA
---|---|---|---
2 | 0 | NA| NA
---|---|---|---
2 | 1 | NA| NA
---|---|---|---
2 | 2 | NA| NA
I would like to generate two ifelse statements or one if possible, so that I can get these two additional columns based on the different combinations:
V1| V2| V3| V4
---|---|---|---
0 | 0 | 0 | AA
---|---|---|---
0 | 1 | 1 | AD
---|---|---|---
0 | 2 | 2 | DD
---|---|---|---
1 | 0 | 0 | AB
---|---|---|---
1 | 1 | NA| NA
---|---|---|---
1 | 2 | 1 | CD
---|---|---|---
2 | 0 | 0 | BB
---|---|---|---
2 | 1 | 0 | BC
---|---|---|---
2 | 2 | 0 | CC
I'm really stuck at this point and none of the options I have tried work.
If I try something like this:
DF$V3 <- if((DF$V1=2) & (DF$V2 = 2)) {DF$V3 = 0}
All values in V1 and V2 are converted into 2 and all values in V3 are converted to 0.
If I use the elseif command in the following way:
DF$V3 <- elseif((DF$V1=2) & (DF$V2 = 2)) {DF$V3 = 0}
I get Error: could not find function "elseif"
I have read several forums about nested if and elseif statements in R, but I'm not able to figure out how to get the results I want using two conditions on two different columns.
Can somebody suggest any options?
Thank you very much,
Best,
Yatrosin
Upvotes: 1
Views: 7908
Reputation: 160407
Up front: I think the use of ifelse
statements in this problem is strongly ill-advised. It requires significant nesting, sacrificing performance and readability. Though these two solutions may be a little harder if you aren't familiar with mapply
or table-join-calculus, the payoff in stability and performance will far outweigh the time to learn these techniques.
Two methods:
One way is to define look-up arrays, where the row names reflect the possible V1
values, and the column names reflect the possible V2
values. (Note that when referencing these lookup matrices, one must use as.character
if your values are numeric/integer, since otherwise they will look for the slice/row number, not the specific matching column/row.)
Examples:
dat <- data.frame(
V1 = c(0,0,0,1,1,1,2,2,2),
V2 = c(0,1,2,0,1,2,0,1,2)
)
dmnms <- list(c(0,1,2), c(0,1,2))
m3 <- matrix(c(0, 1, 2,
0, NA, 1,
0, 0, 0),
nrow = 3, byrow = TRUE, dimnames = dmnms)
m4 <- matrix(c("AA", "AD", "DD",
"AB", NA, "CD",
"BB", "BC", "CC"),
nrow = 3, byrow = TRUE, dimnames = dmnms)
m3
# 0 1 2
# 0 0 1 2
# 1 0 NA 1
# 2 0 0 0
m4
# 0 1 2
# 0 "AA" "AD" "DD"
# 1 "AB" NA "CD"
# 2 "BB" "BC" "CC"
in this case, notice the 0
, 1
, and 2
in the row/column margins. In a matrix with no names, these are typically [1,]
, [2,]
, etc, indicating that actual names are not available, instead reflecting just the row number. However, since these are character
(no brackets/commas), they can be referenced directly, ala
m3["0","2"]
# [1] 2
m4["1","0"]
# [1] "AB"
From here, you just need to map these lookups into new columns, something like:
dat$V3 <- mapply(`[`, list(m3), as.character(dat$V1), as.character(dat$V2))
dat$V4 <- mapply(`[`, list(m4), as.character(dat$V1), as.character(dat$V2))
dat
# V1 V2 V3 V4
# 1 0 0 0 AA
# 2 0 1 1 AD
# 3 0 2 2 DD
# 4 1 0 0 AB
# 5 1 1 NA <NA>
# 6 1 2 1 CD
# 7 2 0 0 BB
# 8 2 1 0 BC
# 9 2 2 0 CC
Another method is to join a known data.frame
onto your data. This has an added benefit of easily expanding to more than two criteria. (Technically, the matrix
method can expand to more than 2, in which case it would be an n
-dim array
, but it is often a little harder to edit, manage, and visualize.)
In your example, this doesn't initially gain you much, since you need to pre-define your data.frame, but I'm guessing that this is just representative data, and your conditional classification is on much more data.
I'll define the joiner data.frame
that will be used against your actual data. This is the reference data, from which all input permutations will be defined into the respective V3
and V4
values.
joiner <- data.frame(
V1 = c(0,0,0,1,1,1,2,2,2),
V2 = c(0,1,2,0,1,2,0,1,2),
V3 = c(0, 1, 2, 0, NA, 1, 0, 0, 0),
V4 = c("AA", "AD", "DD", "AB", NA, "CD", "BB", "BC", "CC"),
stringsAsFactors = FALSE
)
I'll create a sample second data to demonstrate the merge:
dat2 <- data.frame(
V1 = c(2, 0, 1, 0),
V2 = c(0, 1, 2, 2)
)
merge(dat2, joiner, by = c("V1", "V2"))
# V1 V2 V3 V4
# 1 0 1 1 AD
# 2 0 2 2 DD
# 3 1 2 1 CD
# 4 2 0 0 BB
Edit: if you are concerned about dropping rows, then add all.x=TRUE
to the merge
command. If (as you saw based on your comment) you use all=TRUE
, this is a full join in SQL parlance, meaning it will keep all rows from both tables, even if there is not a match made. This may be better explained by referencing this answer and noting that I'm suggesting a left join with all.x
, keeping all on the left (first argument), only merging in rows on the right where a match is made.
(Note: this can also be done quite easily using dplyr
and data.table
packages.)
Upvotes: 3
Reputation: 1433
Building on the first half of the post by @r2evans, using named vectors to do multidimensional lookups instead of matrices. The logic is the same, but I find the syntax to be easier to interpret/debug.
dat <- data.frame(
V1 = c(0, 0, 0, 1, 1, 1, 2, 2, 2),
V2 = c(0, 1, 2, 0, 1, 2, 0, 1, 2)
)
# Use a named vector to store key-value pairs:
# keys: combinations of V1 and V2 in text form
# values: the desired result associated with each combination
V3_lookup <- c("0,0" = 0,
"0,1" = 1,
"0,2" = 2,
"1,0" = 0,
"1,1" = NA,
"1,2" = 1,
"2,0" = 0,
"2,1" = 0,
"2,2" = 0)
V4_lookup <- c("0,0" = "AA",
"0,1" = "AD",
"0,2" = "DD",
"1,0" = "AB",
"1,1" = NA,
"1,2" = "CD",
"2,0" = "BB",
"2,1" = "BC",
"2,2" = "CC")
# Create a character vector with the inputs
# (the combinations of V1 and V2 that actually occur in the data)
id <- paste(dat$V1, dat$V2, sep = ",")
id
[1] "0,0" "0,1" "0,2" "1,0" "1,1" "1,2" "2,0" "2,1" "2,2"
# Map these strings to the desired outputs
dat$V3 <- V3_lookup[ id ]
dat$V4 <- V4_lookup[ id ]
# Result
dat
V1 V2 V3 V4
1 0 0 0 AA
2 0 1 1 AD
3 0 2 2 DD
4 1 0 0 AB
5 1 1 NA <NA>
6 1 2 1 CD
7 2 0 0 BB
8 2 1 0 BC
9 2 2 0 CC
Upvotes: 2
Reputation: 2050
In your example you have a suite of potential outputs:
library(dplyr)
df <- data.frame(V1 = rep(0:2, each=3), V2 = rep(0:2, 3))
df %>%
mutate(V3 = ifelse(V1==0 & V2==1 | V1==1 & V2==2, 1,
ifelse(V1==0 & V2==2, 2,
ifelse(V1==1 & V2==1, NA,0))),
V4 = ifelse(V1==0 & V2==0, 'AA',
ifelse(V1==0 & V2==1, 'AD',
ifelse(V1==0 & V2==2, 'DD',
ifelse(V1==1 & V2==0, 'AB',
ifelse(is.na(V3), 'NA',
ifelse(V1==1 & V2==2, 'CD',
ifelse(V1==2 & V2==0, 'BD',
ifelse(V1==2 & V2==1, 'BC','CC')))))))))
Without a more complete understanding of your data this is about the best I can come up with.
Upvotes: 1
Reputation: 7292
You can use an ifelse()
statement with multiple conditions:
ifelse(a == 1 & b == 2, TRUE, FALSE)
In R you need to use ==
to check for equality.
For example:
df <- data.frame(
+ x = c(1,1,2,2),
+ y = c(3,3,4,4))
df$z <- ifelse(df$x==1 & df$y==3, "Yes","No")
df
x y z
1 1 3 Yes
2 1 3 Yes
3 2 4 No
4 2 4 No
Upvotes: 0