Canada2015
Canada2015

Reputation: 197

R - Mean calculation using group_by based on Date column?

*Before posting, I went through this post, but it did NOT work for Date format as I had in my data:

Using R & dplyr to summarize - group_by, count, mean, sd*

---------------------------------------------------------------------

What I have:

I have a data frame with two columns (i.e., "Date" and "Average") which contains the daily average precipitation for 5 years.

Here is head and tail of this data frame:

> head(years_nc)
    Date    Average
1 2010-01-01 0.00207909
2 2010-01-02 0.00207909
3 2010-01-03 0.00207909
4 2010-01-04 0.00207909
5 2010-01-05 0.00207909
6 2010-01-06 0.00207909

> tail(years_nc)
          Date     Average
3334271 2014-12-26 0.004983558
3334272 2014-12-27 0.004983558
3334273 2014-12-28 0.004983558
3334274 2014-12-29 0.004983558
3334275 2014-12-30 0.004983558
3334276 2014-12-31 0.004983558

To make things more clear, you could download this data frame:

https://www.dropbox.com/s/7wozzxvu6uckqsu/MyData.csv?dl=1

My Goal:

I am trying to make the mean of "Average" column for each year, separably.

This is my code to do so:

library(dplyr)
library(lubridate)

years_nc %>%
  group_by(Date) %>%
  summarize(avg_preci = mean(Average, na.rm = TRUE))

It returns only one value:

> 
   avg_preci
1 0.00195859

But I want R to:

(a) make a group for each year;

(b) then calculate the mean of Average precipitation in a yearly basis for me.

In other words, I must have 5 mean values; one value per year.

What is my mistake in the code?

Could anybody help me with this problem?

Thanks.

Upvotes: 3

Views: 2506

Answers (1)

deepseefan
deepseefan

Reputation: 3791

You're almost in the right way. First ensure that your Date column is actually date. Then, when you do the grouping, do it by year only not by ymd which is in your dataframe. The script can be modified as follows.

years_nc$Date <- ymd(years_nc$Date)

years_nc %>%
  group_by(year(Date)) %>%
  summarize(avg_preci = mean(Average, na.rm = TRUE))
# #A tibble: 5 x 2
#     `year(Date)` avg_preci
#           <dbl>     <dbl>
# 1         2010   0.00196
# 2         2011   0.00196
# 3         2012   0.00196
# 4         2013   0.00196
# 5         2014   0.00196

Upvotes: 4

Related Questions