Reputation: 103
I have a data frame with the categories fruits, ripeness, and mean.
How can I create a for loop that runs a ttest to determine the mean difference for the ripeness for EACH fruit? In other words, for apples, the ttest would produce a result of the mean difference between ripe and unripe apples.
An example of this would look like the following table.
Upvotes: 1
Views: 86
Reputation: 736
Something like this could work for returning p-values of the t-test comparing "Ripeness" as you loop through the unique "Fruits" that appear in your data.
## create a vector of the unique fruit in the data; vector of fruit to be tested
fruit<-unique(data$Fruits)
## iterate through your list of unique fruit, testing as you go
for(i in 1:length(fruit)){
## subset your data to include only the current fruit to be tested
df<-filter(data, Fruits==fruit[i])
## let the user know which fruit is being tested
message(fruit[i])
## create a vector of the unique ripeness states of the current fruit to be tested
ripe<-unique(df$Ripeness)
## make sure two means exist; ensure there are both ripe and non-ripe values
if(length(ripe) < 2){
## if only one ripeness, let user know and skip to next unique fruit
message("only one ripeness")
next
}
## try testing the fruit and return p-value if success
tryCatch(
{
message(t.test(Mean ~ Ripeness, data = df)$p.value)
},
## if error in t-testing return message that there are "not enough observations"
error=function(cond) {
message("not enough observations")
}
)
}
I hope this helps!
Upvotes: 2
Reputation: 72828
Assuming fruits
is coded as a categorical variable (i.e. factor
as it should be), you could use sapply
to iteratively subset data by each fruit. In t.test
we use alternative="two.sided"
, just to emphasize although its the default setting.
However, your data is very small and Bananas
are only ripe. I therefore a larger sample data set to demonstrate.
res <- sapply(levels(dat$fruits), function(x)
t.test(mean ~ ripeness, dat[dat$fruits %in% x, ], alternative="two.sided")
)
res
# Apple Banana Orange
# statistic 0.948231 0.3432062 0.4421971
# parameter 23.38387 30.86684 16.47366
# p.value 0.3527092 0.7337699 0.664097
# conf.int Numeric,2 Numeric,2 Numeric,2
# estimate Numeric,2 Numeric,2 Numeric,2
# null.value 0 0 0
# stderr 0.8893453 1.16548 1.043739
# alternative "two.sided" "two.sided" "two.sided"
# method "Welch Two Sample t-test" "Welch Two Sample t-test" "Welch Two Sample t-test"
# data.name "mean by ripeness" "mean by ripeness" "mean by ripeness"
Data:
set.seed(42)
n <- 1e2
dat <- data.frame(fruits=factor(sample(1:3, n, replace=T),
labels=c("Apple", "Banana", "Orange")),
ripeness=factor(rbinom(n, 1, .4), labels=c("yes", "no")),
mean=round(runif(n)*10))
Please note for the future that you should include a minimal self-contained example including data in an appropriate format (never images, please read here on how to do that), and all the steps you've tried so far, since Stack Overflow is no coding service. Cheers!
Upvotes: 1