user3036416
user3036416

Reputation: 1255

Splitting sequentially a data frame R

I have a data frame like this

V1 V2 V3 V4  month   year
1 -1  9  1     1      1989
1 -1  9  1     1      1989
4 -1  9  1     2      1989
3  2  7  3     1      1990
4  4  8  2     2      1990 
3  6  9  2     2      1990 
4  7  0  2     2      1990 
5  8  4  2     2      1990 

where the first 4 rows indicate the value of the quantity A in the cell 1,2,3,4 and the last two columns give the month and the year. What I would like to do is to calculate the monthly average of A for every cell and so to end up with a list

 V1
   1989
         <A>jen  <A>feb ..
           1       4
   1990
         <A>jen  <A>feb ..
           3       4
 V2

 V3

Many thanks

Upvotes: 0

Views: 92

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

I was still hoping for something a little bit more precise in your question as to what your desired output is exactly, but since you haven't updated that part, I'll provide two options.

Option 1

aggregate seems to be a pretty straightforward tool for this task, particularly if sticking with a "wide" format would be fine for your needs.

aggregate(. ~ year + month, mydf, mean)
#   year month V1    V2   V3 V4
# 1 1989     1  1 -1.00 9.00  1
# 2 1990     1  3  2.00 7.00  3
# 3 1989     2  4 -1.00 9.00  1
# 4 1990     2  4  6.25 5.25  2

Option 2

If you prefer your data in a "long" format, you should explore the "reshape2" package which can handle the reshaping and aggregating in just a few steps.

library(reshape2)
mydfL <- melt(mydf, id.vars = c("year", "month")) 

## The next step is purely cosmetic...
mydfL$month <- factor(month.abb[mydfL$month], month.abb, ordered = TRUE)
head(mydfL)
#   year month variable value
# 1 1989   Jan       V1     1
# 2 1989   Jan       V1     1
# 3 1989   Feb       V1     4
# 4 1990   Jan       V1     3
# 5 1990   Feb       V1     4
# 6 1990   Feb       V1     3

## This is the actual aggregation and reshaping step...    
out <- dcast(mydfL, variable + year ~ month, 
             value.var = "value", fun.aggregate = mean)
out
#   variable year Jan   Feb
# 1       V1 1989   1  4.00
# 2       V1 1990   3  4.00
# 3       V2 1989  -1 -1.00
# 4       V2 1990   2  6.25
# 5       V3 1989   9  9.00
# 6       V3 1990   7  5.25
# 7       V4 1989   1  1.00
# 8       V4 1990   3  2.00

Upvotes: 1

Related Questions