Reputation: 113
I have four variables as below in a data.frame which continues on thousands of rows:
One Two Three Four
TRUE TRUE FALSE FALSE
FALSE TRUE TRUE TRUE
TRUE FALSE FALSE TRUE
TRUE TRUE TRUE FALSE
FALSE TRUE FALSE TRUE
FALSE FALSE TRUE FALSE
TRUE FALSE FALSE TRUE
I want to create two new variables, one which merges columns one and two, the second which merges columns three and four. So each new column would display TRUE if either or both of the two columns displayed TRUE, and would display FALSE if both were false. The resulting data would look like this:
One Two OneTwo Three Four ThreeFour
TRUE TRUE TRUE FALSE FALSE FALSE
FALSE TRUE TRUE TRUE TRUE TRUE
TRUE FALSE TRUE FALSE TRUE TRUE
TRUE TRUE TRUE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE
FALSE FALSE FALSE TRUE FALSE TRUE
TRUE FALSE TRUE FALSE TRUE TRUE
Any help would be much appreciated. I've looked through some other questions but can't find how to do this specifically.
Upvotes: 8
Views: 1310
Reputation: 34406
You can achieve this in a vectorized way:
tf <- c(TRUE, FALSE)
nm <- names(df)
# Merge
res <- cbind(df, df[tf] | df[rev(tf)])
# Set the names
names(res) <- c(nm, paste0(nm[tf], nm[rev(tf)]))
Gives:
V1 V2 V3 V4 V5 V6 V1V2 V3V4 V5V6
1 FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
2 TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
3 TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
4 TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE
5 TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
Data:
set.seed(5)
df <- as.data.frame(matrix(sample(c(TRUE, FALSE), 30, replace = TRUE), 5))
Upvotes: 4
Reputation: 126
Using the package dplyr
you can do this:
library(dplyr)
data <- data %>% mutate(
OneTwo = as.logical(One + Two),
ThreeFour = as.logical(Three + Four))
This works since TRUE
and FALSE
are actually saved as 1 and 0 by the computer. R then codes values larger 0 as TRUE
. To be a bit more "correct", you could also use this code, to get back 0s and 1s before converting them to logicals:
library(dplyr)
data <- data %>%
mutate(
OneTwo = as.logical(pmax(One, Two)),
ThreeFour = as.logical(pmax(One, Two)))
Upvotes: 6
Reputation: 8146
Using case_when
from dplyr
package
library(dplyr)
df %>%
mutate(OneTwo = case_when(One == TRUE & Two == TRUE ~ TRUE,
One == FALSE & Two == TRUE ~ TRUE,
One == TRUE & Two == FALSE ~ TRUE,
One == FALSE & Two == FALSE ~ FALSE),
ThreeFour = case_when(Three == TRUE & Four == TRUE ~ TRUE,
Three == FALSE & Four == TRUE ~ TRUE,
Three == TRUE & Four == FALSE ~ TRUE,
Three == FALSE & Four == FALSE ~ FALSE))
Upvotes: 1
Reputation: 388907
Here's a way which can be extended to any number of columns.
#Create group of every two columns
cols <- ceiling(seq_len(ncol(df))/2)
#Create column names
new_col <- tapply(names(df), cols, paste0, collapse = "")
#Split every two columns and use `|`.
df[new_col] <- sapply(split.default(df, cols), function(x) Reduce(`|`, x))
df
# One Two Three Four OneTwo ThreeFour
#1 TRUE TRUE FALSE FALSE TRUE FALSE
#2 FALSE TRUE TRUE TRUE TRUE TRUE
#3 TRUE FALSE FALSE TRUE TRUE TRUE
#4 TRUE TRUE TRUE FALSE TRUE TRUE
#5 FALSE TRUE FALSE TRUE TRUE TRUE
#6 FALSE FALSE TRUE FALSE FALSE TRUE
#7 TRUE FALSE FALSE TRUE TRUE TRUE
Upvotes: 3
Reputation: 173793
A generalizable solution for many columns. Here, the final two column are the results from comparing each pair of columns.
cbind(df, do.call(cbind, lapply(seq(length(df)/2) * 2, function(i) df[[i-1]] | df[[i]])))
One Two Three Four 1 2
1 TRUE TRUE FALSE FALSE TRUE FALSE
2 FALSE TRUE TRUE TRUE TRUE TRUE
3 TRUE FALSE FALSE TRUE TRUE TRUE
4 TRUE TRUE TRUE FALSE TRUE TRUE
5 FALSE TRUE FALSE TRUE TRUE TRUE
6 FALSE FALSE TRUE FALSE FALSE TRUE
7 TRUE FALSE FALSE TRUE TRUE TRUE
Upvotes: 3
Reputation: 2306
You could try this:
OneTwo <- ifelse(One == TRUE & Two == TRUE, TRUE,
ifelse(One == TRUE & Two == FALSE, TRUE,
ifelse(One == FALSE & Two == TRUE, TRUE,
ifelse(One == FALSE & Two == FALSE, FALSE)))
Upvotes: 1