Reputation: 197
*Before posting, I went through this post, but it did NOT work for Date format as I had in my data:
Using R & dplyr to summarize - group_by, count, mean, sd*
What I have:
I have a data frame with two columns (i.e., "Date" and "Average") which contains the daily average precipitation for 5 years.
Here is head and tail of this data frame:
> head(years_nc)
Date Average
1 2010-01-01 0.00207909
2 2010-01-02 0.00207909
3 2010-01-03 0.00207909
4 2010-01-04 0.00207909
5 2010-01-05 0.00207909
6 2010-01-06 0.00207909
> tail(years_nc)
Date Average
3334271 2014-12-26 0.004983558
3334272 2014-12-27 0.004983558
3334273 2014-12-28 0.004983558
3334274 2014-12-29 0.004983558
3334275 2014-12-30 0.004983558
3334276 2014-12-31 0.004983558
To make things more clear, you could download this data frame:
https://www.dropbox.com/s/7wozzxvu6uckqsu/MyData.csv?dl=1
My Goal:
I am trying to make the mean of "Average" column for each year, separably.
This is my code to do so:
library(dplyr)
library(lubridate)
years_nc %>%
group_by(Date) %>%
summarize(avg_preci = mean(Average, na.rm = TRUE))
It returns only one value:
>
avg_preci
1 0.00195859
But I want R to:
(a) make a group for each year;
(b) then calculate the mean of Average precipitation in a yearly basis for me.
In other words, I must have 5 mean values; one value per year.
What is my mistake in the code?
Could anybody help me with this problem?
Thanks.
Upvotes: 3
Views: 2506
Reputation: 3791
You're almost in the right way. First ensure that your Date
column is actually date
. Then, when you do the grouping, do it by year
only not by ymd
which is in your dataframe. The script can be modified as follows.
years_nc$Date <- ymd(years_nc$Date)
years_nc %>%
group_by(year(Date)) %>%
summarize(avg_preci = mean(Average, na.rm = TRUE))
# #A tibble: 5 x 2
# `year(Date)` avg_preci
# <dbl> <dbl>
# 1 2010 0.00196
# 2 2011 0.00196
# 3 2012 0.00196
# 4 2013 0.00196
# 5 2014 0.00196
Upvotes: 4