Reputation: 27
say I generate a data with this code
month<-c(rep(1,7),rep(2,7),rep(3,7))
date<-rep(c(rep(1,2),rep(2,3),rep(3,2)),3)
value<-rnorm(21)
df<-cbind(month,date,value))
so now i have something like this
month date value
[1,] 1 1 -0.04256470
[2,] 1 1 -2.50922102
[3,] 1 2 -0.50458814
[4,] 1 2 -1.00133322
[5,] 1 2 0.70297514
[6,] 1 3 0.79316448
[7,] 1 3 0.66798947
[8,] 2 1 1.60548790
[9,] 2 1 -0.42484680
[10,] 2 2 -0.33906887
[11,] 2 2 1.02457883
[12,] 2 2 0.64175917
[13,] 2 3 -0.03832247
[14,] 2 3 0.86878829
[15,] 3 1 1.46691690
[16,] 3 1 0.77897932
[17,] 3 2 -1.02759643
[18,] 3 2 0.15902324
[19,] 3 2 1.36580741
[20,] 3 3 -1.70749048
[21,] 3 3 0.11327990
how would I go about taking the average value for a given date in a month?
So in this case I would want my output to look like this...
month date avgvalue
1 1 -1.27589
1 2 -0.267649
1 3 0.66798947
2 1 0.590321
...
I would really appreciate the help thank you :)
Upvotes: 0
Views: 2580
Reputation: 193687
You tagged your question with tapply
, so here's a tapply
answer:
tapply(df[, "value"], INDEX=list(df[, "month"], df[, "date"]), FUN=mean)
# 1 2 3
# 1 -0.42965680 0.6943236 0.04505399
# 2 0.55021401 -0.3138895 -0.40966078
# 3 0.05676266 0.5212944 0.12521106
data.frame(as.table(
tapply(df[, "value"], INDEX=list(df[, "month"], df[, "date"]), FUN=mean)))
# Var1 Var2 Freq
# 1 1 1 -0.42965680
# 2 2 1 0.55021401
# 3 3 1 0.05676266
# 4 1 2 0.69432363
# 5 2 2 -0.31388954
# 6 3 2 0.52129439
# 7 1 3 0.04505399
# 8 2 3 -0.40966078
# 9 3 3 0.12521106
More common approaches, though, are aggregate
(mentioned), plyr
(mentioned), data.table
and (recently) dplyr
. The data.table
and dplyr
approaches are below.
library(data.table)
DT <- data.table(df)
DT[, mean(value), by = list(month, date)]
library(dplyr)
DF <- data.frame(df)
DF %.% group_by(month, date) %.% summarise(mean(value))
Much less common would be ave
+ unique
:
unique(within(data.frame(df), {
MV <- ave(value, month, date)
rm(value)
}))
But they all get you to the same place.
Upvotes: 1
Reputation: 44340
You can use aggregate
:
aggregate(df[,3], by=list(month=df[,1], date=df[,2]), mean)
# month date x
# 1 1 1 0.5661431
# 2 2 1 0.1843661
# 3 3 1 1.8339898
# 4 1 2 1.2053077
# 5 2 2 -0.2575551
# 6 3 2 -0.4464268
# 7 1 3 -0.7154689
# 8 2 3 0.7895702
# 9 3 3 0.4853081
Upvotes: 2
Reputation: 3147
library("plyr")
df <- data.frame(df)
ddply(df, .(month,date), summarize, avgvalue=mean(value))
Upvotes: 1