Jason Donnald
Jason Donnald

Reputation: 2316

Error in finding average/mean of columns month wise in R

I have this data frame and I want replace/form new data set with average values for each month.

    month    value
  Oct. 2012  52.7
  Nov. 2012  53.0
  Dec. 2012  54.1
  Jan. 2013  53.1
  Feb. 2013  53.6
  Mar. 2013  53.8
  Apr. 2013  54.1
  May. 2013  54.6
  Jun. 2013  54.8
  Jul. 2013  54.3
  Aug. 2013  54.5
  Sep. 2013  54.5
  Oct. 2013  54.0
  Nov. 2013  54.2
  Dec. 2013  54.8
  Oct. 2012  37.2
  Nov. 2012  37.4
  Dec. 2012  38.4
      .        .
      .        .

As you can see that for each month of each year, like Oct. 2012, Nov. 2012 and so on there are multiple values. Now to get the average/mean value for each of these separate month in a data frame I executed this command

data_mean <- data_frame %.% group_by(month) %.% summarize(value = mean(value))

It does correctly finds the mean value of each month but in the output it reorders the month alphabetically. So in output Oct. 2012 comes after Apr. 2013 and so on while it should come before. I want to calculate the mean month wise but do not want to reorder the output alphabetically. How to modify the above command or of there is some shorter way.

Upvotes: 2

Views: 618

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 270298

Change the month column to a factor with appropriately ordered levels:

data_frame %>% 
  mutate(month = factor(substr(month, 1, 3), month.abb)) %>%
  group_by(month) %>% 
  summarize(value = mean(value))

giving:

   month    value
1    Jan 53.10000
2    Feb 53.60000
3    Mar 53.80000
4    Apr 54.10000
5    May 54.60000
6    Jun 54.80000
7    Jul 54.30000
8    Aug 54.50000
9    Sep 54.50000
10   Oct 47.96667
11   Nov 48.20000
12   Dec 49.10000

Update

The question asks for the data by month; however, in a comment this was revised to ask for it by year/month. In that case use zoo to convert it to yearmon and then to numeric (since it appears that dplyr grouping does not support yearmon) and optionally back to yearmon at the end:

library(zoo)

data_frame %>% 
  mutate(month = as.numeric(as.yearmon(as.character(month), "%b. %Y"))) %>%
  group_by(month) %>% 
  summarize(value = mean(value)) %>%
  mutate(month = as.yearmon(month))

Upvotes: 4

Shambho
Shambho

Reputation: 3280

This is one step if your data is in data.table:

library(data.table)               # load library
DT <- as.data.table(DF)           # convert data.frame to data.table
DT[,mean(value),by=month]         # compute mean by month

This essentially does the job in a unique type sort!

If you want to be explicit, here are two more steps:

DT[,`:=`(mon = factor(substr(month, 1, 3), month.abb), 
         yr = substr(month,6,9))] # create new columns for month and year
setkey(DT,yr,mon)                 # set keys, order is important
DT[,mean(value),by=key(DT)]       # compute mean by key

This will give you what you want, and will scale much better for large arrays. (About 10-50 times faster in comparison to data.frame.)

Note that data.table has class data.frame, but is much faster for this and similar operations.

Upvotes: 1

rnso
rnso

Reputation: 24623

You can also use tapply:

with(ddf, tapply(value, month, mean))
    Apr.     Aug.     Dec.     Feb.     Jan.     Jul.     Jun.     Mar.     May.     Nov.     Oct.     Sep. 
54.10000 54.50000 49.10000 53.60000 53.10000 54.30000 54.80000 53.80000 54.60000 48.20000 47.96667 54.50000 

Upvotes: 0

jlhoward
jlhoward

Reputation: 59425

If you're going to be doing a lot of this, you might consider treating this as a time series using the zoo package.

library(zoo)
df$yrmon <- as.yearmon(df$month,"%b. %Y")
aggregate(value~yrmon,df,mean)
##       yrmon value
## 1  Oct 2012 44.95
## 2  Nov 2012 45.20
## 3  Dec 2012 46.25
## 4  Jan 2013 53.10
## 5  Feb 2013 53.60
## 6  Mar 2013 53.80
## 7  Apr 2013 54.10
## 8  May 2013 54.60
## 9  Jun 2013 54.80
## 10 Jul 2013 54.30
## 11 Aug 2013 54.50
## 12 Sep 2013 54.50
## 13 Oct 2013 54.00
## 14 Nov 2013 54.20
## 15 Dec 2013 54.80

Upvotes: 1

Related Questions