Reputation: 21
I am new to R coding and I am trying to split column A into two different groups based on the results of categorical values given in the column "Outcome". Then I want to perform a t-test on them.
The data looks like this:
Column "A" = 4, 6, 8, 10, 11...
Column "Outcome" = Disease, Disease, No disease...
I want to split column "A" which is a column containing continuous values into two different groups/columns based on if they have the disease or not.
for eg: I want the data to look as follows:
Disease vector: 4, 6..... Non disease vector: 10, 11,.....
I tried the following command and it separated column A into two groups based on the values in column "outcome". But I want to assign the values to vectors 1 and 2 so that I can use the t.test command i.e., t.test(vector1,vector2)
.
The command I used was:
split(data$A, f = data$Outcome)
I also tried using an if-else command, but that didn't work either.
I know this should be something very easy, but I can't find the proper command to do this.
Upvotes: 1
Views: 78
Reputation: 17304
You don't really need to split in this specific case, t.test()
handles it for you. But to answer your question:
# generate sample data:
set.seed(1)
data <- data.frame(A = rbinom(10, 20, .5),
Outcome = sample(c("Disease", "No disease"), 10, replace = TRUE))
str(data)
#> 'data.frame': 10 obs. of 2 variables:
#> $ A : int 9 9 10 13 8 13 14 11 11 7
#> $ Outcome: chr "Disease" "Disease" "Disease" "Disease" ...
### split by subsetting:
x <- data$A[data$Outcome == "Disease"]
y <- data$A[data$Outcome == "No Disease"]
# t.test(x, y)
### split() with vectors, returns a list and you can pass list items to t.test():
a <- split(data$A, data$Outcome)
str(a)
#> List of 2
#> $ Disease : int [1:6] 9 9 10 13 8 7
#> $ No disease: int [1:4] 13 14 11 11
# t.test(a$Disease, a$`No disease`)
### split data.frame and use columns of list item with t.test():
d_splt <- split(data, ~ Outcome)
d_splt$Disease$A
#> [1] 9 9 10 13 8 7
# t.test(d_splt$Disease$A, d_splt$`No disease`$A)
Or.. let t.test()
handle this:
t.test(A ~ Outcome, data = data)
#>
#> Welch Two Sample t-test
#>
#> data: A by Outcome
#> t = -2.5845, df = 7.8512, p-value = 0.0329
#> alternative hypothesis: true difference in means between group Disease and group No disease is not equal to 0
#> 95 percent confidence interval:
#> -5.5277060 -0.3056273
#> sample estimates:
#> mean in group Disease mean in group No disease
#> 9.333333 12.250000
Created on 2023-07-04 with reprex v2.0.2
Upvotes: 0