Reputation: 103
I have a set of sample data such as the following:
tableData <- tibble(Fruits = sample(c('Apple', 'Banana', 'Orange'), 30, T),
Ripeness = sample(c('yes', 'no'), 30, T),
Mean = ifelse(Ripeness == 'yes', 1.4 + runif(30), 1.6 + runif(30))) %>%
add_row(Fruits = "Peach", Ripeness = "yes", Mean = 5)
I have a function that summarizes for p-value calculation and a mean difference calculation.
tableData %>%
group_by(Fruits) %>%
summarise(Meandiff = mean(Mean[Ripeness == 'yes'])-
mean(Mean[Ripeness == 'no']),
t_test_pval = get_t_test_pval(Mean ~ Ripeness))
Using the summarise function, is it also possible to add another column that counts the number of observations for each fruit if the fruit has a ripeness of "yes" (ie count apple observations with yes ripeness)?
Upvotes: 0
Views: 38
Reputation: 93811
How about this:
set.seed(2)
tableData <- tibble(Fruits = sample(c('Apple', 'Banana', 'Orange'), 30, T),
Ripeness = sample(c('yes', 'no'), 30, T),
Mean = ifelse(Ripeness == 'yes', 1.4 + runif(30), 1.6 + runif(30))) %>%
add_row(Fruits = "Peach", Ripeness = "yes", Mean = 5)
tableData %>%
group_by(Fruits) %>%
summarise(Meandiff = mean(Mean[Ripeness == 'yes']) - mean(Mean[Ripeness == 'no']),
t_test_p_val = if(length(unique(Ripeness))!=2) NaN else t.test(Mean ~ Ripeness)$p.value,
N.yes = sum(Ripeness=="yes"))
Fruits Meandiff t_test_p_val N.yes <chr> <dbl> <dbl> <int> 1 Apple -0.260 0.241 5 2 Banana -0.223 0.305 4 3 Orange -0.692 0.000290 7 4 Peach NaN NaN 1
Upvotes: 3