NickL
NickL

Reputation: 103

How can I use the count function under certain conditions in R?

I have a set of sample data such as the following:

tableData <- tibble(Fruits = sample(c('Apple', 'Banana', 'Orange'), 30, T),
                        Ripeness = sample(c('yes', 'no'), 30, T),
                        Mean = ifelse(Ripeness == 'yes', 1.4 + runif(30), 1.6 + runif(30))) %>% 
 add_row(Fruits = "Peach", Ripeness = "yes", Mean = 5)

I have a function that summarizes for p-value calculation and a mean difference calculation.

tableData %>% 
  group_by(Fruits) %>%    
  summarise(Meandiff = mean(Mean[Ripeness == 'yes'])- 
        mean(Mean[Ripeness == 'no']), 
       t_test_pval = get_t_test_pval(Mean ~ Ripeness))

Using the summarise function, is it also possible to add another column that counts the number of observations for each fruit if the fruit has a ripeness of "yes" (ie count apple observations with yes ripeness)?

Upvotes: 0

Views: 38

Answers (1)

eipi10
eipi10

Reputation: 93811

How about this:

set.seed(2)
tableData <- tibble(Fruits = sample(c('Apple', 'Banana', 'Orange'), 30, T),
                    Ripeness = sample(c('yes', 'no'), 30, T),
                    Mean = ifelse(Ripeness == 'yes', 1.4 + runif(30), 1.6 + runif(30))) %>% 
  add_row(Fruits = "Peach", Ripeness = "yes", Mean = 5)

tableData %>% 
  group_by(Fruits) %>%    
  summarise(Meandiff = mean(Mean[Ripeness == 'yes']) - mean(Mean[Ripeness == 'no']), 
            t_test_p_val = if(length(unique(Ripeness))!=2) NaN else t.test(Mean ~ Ripeness)$p.value,
            N.yes = sum(Ripeness=="yes"))
  Fruits Meandiff t_test_p_val N.yes
  <chr>     <dbl>        <dbl> <int>
1 Apple    -0.260     0.241        5
2 Banana   -0.223     0.305        4
3 Orange   -0.692     0.000290     7
4 Peach   NaN       NaN            1

Upvotes: 3

Related Questions