Rnoobie
Rnoobie

Reputation: 139

How to obtain the average of two different row ranges in a specific column?

I have the following sample dataframe. The first column is the month and the second column is the number of surveys conducted each month.

month = c(1,2,3,4,5,6,7,8,9,10,11,12)
surveys = c(4,5,3,7,3,4,4,4,6,1,1,7)

df = data.frame(month, surveys)

I want to calculate the average number of surveys from May - August, and then, the average number of surveys for the remaining months (Jan - April PLUS September - December).

How do I do this using the dplyr package?

Upvotes: 0

Views: 52

Answers (3)

CristianGabriel
CristianGabriel

Reputation: 56

First, you need to install the dplyr package if you haven't already:

install.packages("dplyr")

Then you can load the package and use the group_by() and summarize() functions to calculate the averages:

library(dplyr)

df <- data.frame(month, surveys)

may_aug_avg <- df %>%
filter(month >= 5 & month <= 8) %>%
summarize(average = mean(surveys))

remaining_months_avg <- df %>%
filter(!(month >= 5 & month <= 8)) %>%
summarize(average = mean(surveys))

The first line of code filters the dataframe to only include the months of May through August, and then calculates the average of the number of surveys for those months. The second line of code filters the dataframe to exclude the months of May through August, and then calculates the average of the number of surveys for the remaining months.

You can check the values of may_aug_avg, remaining_months_avg to access the averages.

Hope this helps!

Upvotes: 1

jpsmith
jpsmith

Reputation: 17174

Assuming the integers represent months, in dplyr, you could use group_by with a boolean TRUE/FALSE and find the mean with summarize:

df %>% group_by(MayAug = month %in% 5:8) %>% summarize(mean = mean(surveys))

#  MayAug            mean
#  <lgl>            <dbl>
#1 FALSE             4.25
#2 TRUE              3.75

Upvotes: 2

Jilber Urbina
Jilber Urbina

Reputation: 61154

I first create a new factor variable period with labels, then group_by period and summarise using mean

df %>% 
  mutate(period = factor(between(month, 5,8), labels = c("Other months", "May-Aug"))) %>% 
  group_by(period) %>% 
  summarise(mean_surveys = mean(surveys))

 # A tibble: 2 × 2
  period       mean_surveys
  <fct>               <dbl>
1 Other months         4.25
2 May-Aug              3.75

Upvotes: 1

Related Questions