Kierrajames
Kierrajames

Reputation: 21

How to split a continuous variable into multiple groups based on the categorical values in another column

I am new to R coding and I am trying to split column A into two different groups based on the results of categorical values given in the column "Outcome". Then I want to perform a t-test on them.

The data looks like this:

Column "A" = 4, 6, 8, 10, 11...
Column "Outcome" = Disease, Disease, No disease...

I want to split column "A" which is a column containing continuous values into two different groups/columns based on if they have the disease or not.

for eg: I want the data to look as follows:

Disease vector: 4, 6..... Non disease vector: 10, 11,.....

I tried the following command and it separated column A into two groups based on the values in column "outcome". But I want to assign the values to vectors 1 and 2 so that I can use the t.test command i.e., t.test(vector1,vector2).

The command I used was:

split(data$A, f = data$Outcome)

I also tried using an if-else command, but that didn't work either.

I know this should be something very easy, but I can't find the proper command to do this.

Upvotes: 1

Views: 78

Answers (1)

margusl
margusl

Reputation: 17304

You don't really need to split in this specific case, t.test() handles it for you. But to answer your question:

# generate sample data:
set.seed(1)
data <- data.frame(A = rbinom(10, 20, .5), 
                   Outcome = sample(c("Disease", "No disease"), 10, replace = TRUE))
str(data)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ A      : int  9 9 10 13 8 13 14 11 11 7
#>  $ Outcome: chr  "Disease" "Disease" "Disease" "Disease" ...


### split by subsetting:
x <- data$A[data$Outcome == "Disease"]
y <- data$A[data$Outcome == "No Disease"]
# t.test(x, y)

### split() with vectors, returns a list and you can pass list items to t.test():
a <- split(data$A, data$Outcome)
str(a)
#> List of 2
#>  $ Disease   : int [1:6] 9 9 10 13 8 7
#>  $ No disease: int [1:4] 13 14 11 11
# t.test(a$Disease, a$`No disease`)

### split data.frame and use columns of list item with t.test():
d_splt <- split(data, ~ Outcome)
d_splt$Disease$A
#> [1]  9  9 10 13  8  7
# t.test(d_splt$Disease$A, d_splt$`No disease`$A)

Or.. let t.test() handle this:

t.test(A ~ Outcome, data = data)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  A by Outcome
#> t = -2.5845, df = 7.8512, p-value = 0.0329
#> alternative hypothesis: true difference in means between group Disease and group No disease is not equal to 0
#> 95 percent confidence interval:
#>  -5.5277060 -0.3056273
#> sample estimates:
#>    mean in group Disease mean in group No disease 
#>                 9.333333                12.250000

Created on 2023-07-04 with reprex v2.0.2

Upvotes: 0

Related Questions