Reputation: 2684
I'm trying to obtain a vector of factors X
whose values depends on two (maybe more) columns in a data frame. So it can has more than two levels.
There is an easy way to perform it using C/C++-like conditional statements in a for loop. Let's say, If I'm constructing X
from values in two boolean columns Col1
and Col2
in a dataframe MATRIX
, I can do it easily as:
X=vector()
for ( i in 1:nrow(MATRIX)) {
if (MATRIX$Col1[i]==1 && MATRIX$Col2[i]==1) {
X[i] = "both"
} else if (MATRIX$Col1[i]==1) {
X[i] = "col1"
} else if (MATRIX$Col2[i]==1) {
X[i] = "col2"
} else {
X[i] = "none"
}
}
The problem is, obviosly, that in large dataframes it takes many time running. I should use vectorization in order to optimize this, but I cannot see the way, since functions as *apply, ifelse or any does not seem help is such a task, where the result is not boolean.
Any ideas?
Upvotes: 1
Views: 4928
Reputation: 56004
We can use factor
:
# dummy data
set.seed(1)
MATRIX <- data.frame(Col1 = sample(0:1, 10, replace = TRUE),
Col2 = sample(0:1, 10, replace = TRUE))
# using factor
cbind(MATRIX,
X = factor(paste(as.numeric(MATRIX$Col1 == 1),
as.numeric(MATRIX$Col2 == 1), sep = "_"),
levels = c("0_0", "0_1", "1_0", "1_1"),
labels = c("none", "col2", "col1", "both")))
# Col1 Col2 X
# 1 0 0 none
# 2 0 0 none
# 3 1 1 both
# 4 1 0 col1
# 5 0 1 col2
# 6 1 0 col1
# 7 1 1 both
# 8 1 1 both
# 9 1 0 col1
# 10 0 1 col2
Upvotes: 2
Reputation: 7941
Here's a couple of ways to do it:
the most analogous to your existing method is:
X <- ifelse(MATRIX$Col1==1,
ifelse(MATRIX$Col2==1,"both","col1"),
ifelse(MATRIX$Col2==1,"col2","none"))
It can be slightly quicker to do:
x <- rep(NA,nrow(MATRIX))
x[MATRIX$Col1[i]==1 && MATRIX$Col2[i]==1] <- "both"
x[MATRIX$Col1[i]==1 && !MATRIX$Col2[i]==1] <- "col1"
x[!MATRIX$Col1[i]==1 && MATRIX$Col2[i]==1] <- "col2"
x[!MATRIX$Col1[i]==1 && !MATRIX$Col2[i]==1] <- "none"
but it's harder to see whether all cases have been covered by the code
Note:
MATRIX
really is a data.frame
; learning to be
precise about you data types can really help when debugging code. MATRIX$Col1
really is Boolean, you can drop the ==1
comparison,
that's wasting time by converting the matrix to numeric and then
testing for equality. Upvotes: 2