Eams
Eams

Reputation: 59

Making a new column in R based on results of previous columns

I'm trying to make a new column in a table based on two previous columns. My two columns contain logical variables and I want my new column to be a column based on those logical variables (so if the answer is false in both columns then they are part of group 1, if they answer false in choice 1 but true in choice2 then they are part of group 2 etc.)

My data table looks as follows:

Choice1  Choice2 
TRUE     FALSE     
TRUE     TRUE      
FALSE    FALSE     
FALSE    TRUE      
TRUE     TRUE      

I want to create a new column where individuals are grouped into one of 4 groups: 1,2,3,4.

1= False, false

2= false true

3= true false

4= true true

So the results for the above table would be 3,4,1,2,4

I initially tried a code which began looking like what I've done below but currently this doesn't work for a code.

df$newcolumn <- c("(df$Choice1=FALSE,df$Choice2=FALSE)"="1", "(df$Choice1=FALSE, df$Choice2=TRUE)"="2"), "(df$Choice1=TRUE, df$Choice2=FALSE)"="3", "(df$Choice1=TRUE, df$Choice2=TRUE)", ="4")

I've also tried looking for other suitable similar examples on line to try and adapt mine too but haven't been able to find any. Any help would be appreciated

Upvotes: 3

Views: 106

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269471

1) linear formula The indicated formula will give the group number as factor. The factor part was requested in a comment below.

transform(dat1, group = factor(1 + 2*Choice1 + Choice2))
##   Choice1 Choice2 group
## 1    TRUE   FALSE     3
## 2    TRUE    TRUE     4
## 3   FALSE   FALSE     1
## 4   FALSE    TRUE     2
## 5    TRUE    TRUE     4

We can derive the above formula using regression. For this to work it is essential that every combination be represented in dat1 but that is, in fact, the case.

y <- c(3, 4, 1, 2, 4)
fm <- lm(y ~., dat1)
fm
##
## Call:
## lm(formula = y ~ ., data = dat1)
##
## Coefficients:
## (Intercept)  Choice1TRUE  Choice2TRUE  
##           1            2            1  

# check that it is exact, i.e. residual sum of squares is 0
deviance(fm)
## [1] 3.574526e-31

2) explicit combinations Another approach is to explicitly consider each combination:

transform(dat1, group = factor(
  1 * (!Choice1 & !Choice2) +
  2 * (!Choice1 & Choice2) + 
  3 * (Choice1 & !Choice2) + 
  4 * (Choice1 & Choice2)))

giving:

  Choice1 Choice2 group
1    TRUE   FALSE     3
2    TRUE    TRUE     4
3   FALSE   FALSE     1
4   FALSE    TRUE     2
5    TRUE    TRUE     4

3) interaction A third approach is to use interaction to create the group factor.

transform(dat1, group = factor(interaction(Choice2, Choice1), label = ""))

giving:

  Choice1 Choice2 group
1    TRUE   FALSE     3
2    TRUE    TRUE     4
3   FALSE   FALSE     1
4   FALSE    TRUE     2
5    TRUE    TRUE     4

Note

The input in reproducible form is:

dat1 <- structure(list(Choice1 = c(TRUE, TRUE, FALSE, FALSE, TRUE), 
  Choice2 = c(FALSE, 
  TRUE, FALSE, TRUE, TRUE)), class = "data.frame", row.names = 
  c(NA, -5L))

Upvotes: 3

Eric
Eric

Reputation: 2849

Here is a data.table approach using the fcase function which is a fast implementation of SQL CASE WHEN statement for R.

df <- structure(list(Choice1 = c(TRUE, TRUE, FALSE, FALSE, TRUE), Choice2 = c(FALSE, 
TRUE, FALSE, TRUE, TRUE)), class = "data.frame", row.names = c(NA, 
-5L))

library(data.table)

setDT(df)

df[, group := fcase(
  Choice1 == FALSE & Choice2 == FALSE, 1,
  Choice1 == FALSE & Choice2 == TRUE, 2, 
  Choice1 == TRUE & Choice2 == FALSE, 3,
  Choice1 == TRUE & Choice2 == TRUE, 4
)]


#>    Choice1 Choice2 group
#> 1:    TRUE   FALSE     3
#> 2:    TRUE    TRUE     4
#> 3:   FALSE   FALSE     1
#> 4:   FALSE    TRUE     2
#> 5:    TRUE    TRUE     4

Created on 2021-03-16 by the reprex package (v0.3.0)

Upvotes: 0

TarJae
TarJae

Reputation: 78917

Here is a solution with case_when from dplyr package:

library(dplyr)

# your data:
df <- tribble(
~Choice1,  ~Choice2, 
TRUE,     FALSE,     
TRUE,     TRUE,      
FALSE,    FALSE,     
FALSE,    TRUE ,     
TRUE,     TRUE) 

df1 <- df %>% 
  mutate(new_col = case_when(Choice1==FALSE & Choice2 == FALSE ~ 1,
                             Choice1==FALSE & Choice2 == TRUE ~ 2,
                             Choice1==TRUE & Choice2 == FALSE ~ 3,
                             Choice1==TRUE & Choice2 == TRUE ~4))

enter image description here

Upvotes: 1

Related Questions