NickL
NickL

Reputation: 103

How can I use a two sample t-test when there are two groups in R?

I have a data frame with the categories fruits, ripeness, and mean. How can I create a for loop that runs a ttest to determine the mean difference for the ripeness for EACH fruit? In other words, for apples, the ttest would produce a result of the mean difference between ripe and unripe apples. An example of this would look like the following table. Table Example

Upvotes: 1

Views: 86

Answers (2)

Joshua Mire
Joshua Mire

Reputation: 736

Something like this could work for returning p-values of the t-test comparing "Ripeness" as you loop through the unique "Fruits" that appear in your data.

## create a vector of the unique fruit in the data; vector of fruit to be tested
fruit<-unique(data$Fruits)
## iterate through your list of unique fruit, testing as you go
for(i in 1:length(fruit)){
  ## subset your data to include only the current fruit to be tested
  df<-filter(data, Fruits==fruit[i])
  ## let the user know which fruit is being tested
  message(fruit[i])
  ## create a vector of the unique ripeness states of the current fruit to be tested
  ripe<-unique(df$Ripeness)
  ## make sure two means exist; ensure there are both ripe and non-ripe values
  if(length(ripe) < 2){
    ## if only one ripeness, let user know and skip to next unique fruit
    message("only one ripeness")
    next
  }
  ## try testing the fruit and return p-value if success
  tryCatch(
    {
      message(t.test(Mean ~ Ripeness, data = df)$p.value)
    },
    ## if error in t-testing return message that there are "not enough observations"
    error=function(cond) {
      message("not enough observations")
    }
  )    
}

I hope this helps!

Upvotes: 2

jay.sf
jay.sf

Reputation: 72828

Assuming fruits is coded as a categorical variable (i.e. factor as it should be), you could use sapply to iteratively subset data by each fruit. In t.test we use alternative="two.sided", just to emphasize although its the default setting.

However, your data is very small and Bananas are only ripe. I therefore a larger sample data set to demonstrate.

res <- sapply(levels(dat$fruits), function(x) 
  t.test(mean ~ ripeness, dat[dat$fruits %in% x, ], alternative="two.sided")
)
res
#             Apple                     Banana                    Orange                   
# statistic   0.948231                  0.3432062                 0.4421971                
# parameter   23.38387                  30.86684                  16.47366                 
# p.value     0.3527092                 0.7337699                 0.664097                 
# conf.int    Numeric,2                 Numeric,2                 Numeric,2                
# estimate    Numeric,2                 Numeric,2                 Numeric,2                
# null.value  0                         0                         0                        
# stderr      0.8893453                 1.16548                   1.043739                 
# alternative "two.sided"               "two.sided"               "two.sided"              
# method      "Welch Two Sample t-test" "Welch Two Sample t-test" "Welch Two Sample t-test"
# data.name   "mean by ripeness"        "mean by ripeness"        "mean by ripeness"     

Data:

set.seed(42)
n <- 1e2
dat <- data.frame(fruits=factor(sample(1:3, n, replace=T),
                                labels=c("Apple", "Banana", "Orange")),
                  ripeness=factor(rbinom(n, 1, .4), labels=c("yes", "no")),
                  mean=round(runif(n)*10))

Please note for the future that you should include a minimal self-contained example including data in an appropriate format (never images, please read here on how to do that), and all the steps you've tried so far, since Stack Overflow is no coding service. Cheers!

Upvotes: 1

Related Questions