Reputation: 2316
I have this data frame and I want replace/form new data set with average values for each month.
month value
Oct. 2012 52.7
Nov. 2012 53.0
Dec. 2012 54.1
Jan. 2013 53.1
Feb. 2013 53.6
Mar. 2013 53.8
Apr. 2013 54.1
May. 2013 54.6
Jun. 2013 54.8
Jul. 2013 54.3
Aug. 2013 54.5
Sep. 2013 54.5
Oct. 2013 54.0
Nov. 2013 54.2
Dec. 2013 54.8
Oct. 2012 37.2
Nov. 2012 37.4
Dec. 2012 38.4
. .
. .
As you can see that for each month of each year, like Oct. 2012
, Nov. 2012
and so on there are multiple values. Now to get the average/mean value for each of these separate month in a data frame I executed this command
data_mean <- data_frame %.% group_by(month) %.% summarize(value = mean(value))
It does correctly finds the mean value of each month but in the output it reorders the month alphabetically. So in output Oct. 2012
comes after Apr. 2013
and so on while it should come before. I want to calculate the mean month wise but do not want to reorder the output alphabetically. How to modify the above command or of there is some shorter way.
Upvotes: 2
Views: 618
Reputation: 270298
Change the month
column to a factor with appropriately ordered levels:
data_frame %>%
mutate(month = factor(substr(month, 1, 3), month.abb)) %>%
group_by(month) %>%
summarize(value = mean(value))
giving:
month value
1 Jan 53.10000
2 Feb 53.60000
3 Mar 53.80000
4 Apr 54.10000
5 May 54.60000
6 Jun 54.80000
7 Jul 54.30000
8 Aug 54.50000
9 Sep 54.50000
10 Oct 47.96667
11 Nov 48.20000
12 Dec 49.10000
Update
The question asks for the data by month; however, in a comment this was revised to ask for it by year/month. In that case use zoo to convert it to yearmon and then to numeric (since it appears that dplyr grouping does not support yearmon) and optionally back to yearmon at the end:
library(zoo)
data_frame %>%
mutate(month = as.numeric(as.yearmon(as.character(month), "%b. %Y"))) %>%
group_by(month) %>%
summarize(value = mean(value)) %>%
mutate(month = as.yearmon(month))
Upvotes: 4
Reputation: 3280
This is one step if your data is in data.table
:
library(data.table) # load library
DT <- as.data.table(DF) # convert data.frame to data.table
DT[,mean(value),by=month] # compute mean by month
This essentially does the job in a unique
type sort!
If you want to be explicit, here are two more steps:
DT[,`:=`(mon = factor(substr(month, 1, 3), month.abb),
yr = substr(month,6,9))] # create new columns for month and year
setkey(DT,yr,mon) # set keys, order is important
DT[,mean(value),by=key(DT)] # compute mean by key
This will give you what you want, and will scale much better for large arrays. (About 10-50 times faster in comparison to data.frame
.)
Note that data.table
has class data.frame
, but is much faster for this and similar operations.
Upvotes: 1
Reputation: 24623
You can also use tapply:
with(ddf, tapply(value, month, mean))
Apr. Aug. Dec. Feb. Jan. Jul. Jun. Mar. May. Nov. Oct. Sep.
54.10000 54.50000 49.10000 53.60000 53.10000 54.30000 54.80000 53.80000 54.60000 48.20000 47.96667 54.50000
Upvotes: 0
Reputation: 59425
If you're going to be doing a lot of this, you might consider treating this as a time series using the zoo
package.
library(zoo)
df$yrmon <- as.yearmon(df$month,"%b. %Y")
aggregate(value~yrmon,df,mean)
## yrmon value
## 1 Oct 2012 44.95
## 2 Nov 2012 45.20
## 3 Dec 2012 46.25
## 4 Jan 2013 53.10
## 5 Feb 2013 53.60
## 6 Mar 2013 53.80
## 7 Apr 2013 54.10
## 8 May 2013 54.60
## 9 Jun 2013 54.80
## 10 Jul 2013 54.30
## 11 Aug 2013 54.50
## 12 Sep 2013 54.50
## 13 Oct 2013 54.00
## 14 Nov 2013 54.20
## 15 Dec 2013 54.80
Upvotes: 1