Reputation: 59
I'm trying to make a new column in a table based on two previous columns. My two columns contain logical variables and I want my new column to be a column based on those logical variables (so if the answer is false in both columns then they are part of group 1, if they answer false in choice 1 but true in choice2 then they are part of group 2 etc.)
My data table looks as follows:
Choice1 Choice2
TRUE FALSE
TRUE TRUE
FALSE FALSE
FALSE TRUE
TRUE TRUE
I want to create a new column where individuals are grouped into one of 4 groups: 1,2,3,4.
1= False, false
2= false true
3= true false
4= true true
So the results for the above table would be 3,4,1,2,4
I initially tried a code which began looking like what I've done below but currently this doesn't work for a code.
df$newcolumn <- c("(df$Choice1=FALSE,df$Choice2=FALSE)"="1", "(df$Choice1=FALSE, df$Choice2=TRUE)"="2"), "(df$Choice1=TRUE, df$Choice2=FALSE)"="3", "(df$Choice1=TRUE, df$Choice2=TRUE)", ="4")
I've also tried looking for other suitable similar examples on line to try and adapt mine too but haven't been able to find any. Any help would be appreciated
Upvotes: 3
Views: 106
Reputation: 269471
1) linear formula The indicated formula will give the group number as factor. The factor part was requested in a comment below.
transform(dat1, group = factor(1 + 2*Choice1 + Choice2))
## Choice1 Choice2 group
## 1 TRUE FALSE 3
## 2 TRUE TRUE 4
## 3 FALSE FALSE 1
## 4 FALSE TRUE 2
## 5 TRUE TRUE 4
We can derive the above formula using regression. For this to work it is essential that every combination be represented in dat1 but that is, in fact, the case.
y <- c(3, 4, 1, 2, 4)
fm <- lm(y ~., dat1)
fm
##
## Call:
## lm(formula = y ~ ., data = dat1)
##
## Coefficients:
## (Intercept) Choice1TRUE Choice2TRUE
## 1 2 1
# check that it is exact, i.e. residual sum of squares is 0
deviance(fm)
## [1] 3.574526e-31
2) explicit combinations Another approach is to explicitly consider each combination:
transform(dat1, group = factor(
1 * (!Choice1 & !Choice2) +
2 * (!Choice1 & Choice2) +
3 * (Choice1 & !Choice2) +
4 * (Choice1 & Choice2)))
giving:
Choice1 Choice2 group
1 TRUE FALSE 3
2 TRUE TRUE 4
3 FALSE FALSE 1
4 FALSE TRUE 2
5 TRUE TRUE 4
3) interaction A third approach is to use interaction
to create the group factor.
transform(dat1, group = factor(interaction(Choice2, Choice1), label = ""))
giving:
Choice1 Choice2 group
1 TRUE FALSE 3
2 TRUE TRUE 4
3 FALSE FALSE 1
4 FALSE TRUE 2
5 TRUE TRUE 4
The input in reproducible form is:
dat1 <- structure(list(Choice1 = c(TRUE, TRUE, FALSE, FALSE, TRUE),
Choice2 = c(FALSE,
TRUE, FALSE, TRUE, TRUE)), class = "data.frame", row.names =
c(NA, -5L))
Upvotes: 3
Reputation: 2849
Here is a data.table
approach using the fcase
function which is a fast implementation of SQL CASE WHEN statement for R.
df <- structure(list(Choice1 = c(TRUE, TRUE, FALSE, FALSE, TRUE), Choice2 = c(FALSE,
TRUE, FALSE, TRUE, TRUE)), class = "data.frame", row.names = c(NA,
-5L))
library(data.table)
setDT(df)
df[, group := fcase(
Choice1 == FALSE & Choice2 == FALSE, 1,
Choice1 == FALSE & Choice2 == TRUE, 2,
Choice1 == TRUE & Choice2 == FALSE, 3,
Choice1 == TRUE & Choice2 == TRUE, 4
)]
#> Choice1 Choice2 group
#> 1: TRUE FALSE 3
#> 2: TRUE TRUE 4
#> 3: FALSE FALSE 1
#> 4: FALSE TRUE 2
#> 5: TRUE TRUE 4
Created on 2021-03-16 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 78917
Here is a solution with case_when
from dplyr
package:
library(dplyr)
# your data:
df <- tribble(
~Choice1, ~Choice2,
TRUE, FALSE,
TRUE, TRUE,
FALSE, FALSE,
FALSE, TRUE ,
TRUE, TRUE)
df1 <- df %>%
mutate(new_col = case_when(Choice1==FALSE & Choice2 == FALSE ~ 1,
Choice1==FALSE & Choice2 == TRUE ~ 2,
Choice1==TRUE & Choice2 == FALSE ~ 3,
Choice1==TRUE & Choice2 == TRUE ~4))
Upvotes: 1