Reputation: 569
I have a data frame containing years in the first column, months in the second, days in the third, and then rainfall data for different cities (LON and NYC) in the fourth and fifth columns, as shown below:
mat1 = matrix(c(rep(1979, each=360), rep(1:12, each=30), rep(seq(1, 30, by=1), times=12),
rep(seq(5, 25, by=5), times=72), rep(seq(1, 9, by=1), times=40)), nrow=360, ncol=5)
colnames(mat1) = c("Year", "Month", "Day", "LON", "NYC")
I want to calculate monthly sums for each city. The output should take the form:
LON NYC
Jan x x
Feb x x
Mar x x
................
I have tried:
aggregate(LON ~ Month + Year, mat1, sum)
but this does not output data in the format I want, and crucially it also only allows me to calculate for one city (column) at a time. How can I adapt the above to work for all cities and output in the desired format?
Upvotes: 1
Views: 222
Reputation: 10385
Like this?
> aggregate(mat1[,4:5],list(mat1[,2]),sum)
Group.1 LON NYC
1 1 450 141
2 2 450 150
3 3 450 159
4 4 450 141
5 5 450 150
6 6 450 159
7 7 450 141
8 8 450 150
9 9 450 159
10 10 450 141
11 11 450 150
12 12 450 159
Upvotes: 1
Reputation: 13319
Arguably complicating things but a dplyr
-tidyr
option. This uses the newer pivot_*
functions in place of gather
/spread
. A purely dplyr
option shown by @www is probably more efficient.
as.data.frame(mat1) %>%
pivot_longer(LON:NYC,names_to = "city") %>%
group_by(Year,Month,city) %>%
summarise(Sum = sum(value)) %>%
pivot_wider(names_from = city,values_from = Sum)
# A tibble: 12 x 4
# Groups: Year, Month [12]
Year Month LON NYC
<dbl> <dbl> <dbl> <dbl>
1 1979 1 450 141
2 1979 2 450 150
3 1979 3 450 159
4 1979 4 450 141
5 1979 5 450 150
6 1979 6 450 159
7 1979 7 450 141
8 1979 8 450 150
9 1979 9 450 159
10 1979 10 450 141
11 1979 11 450 150
12 1979 12 450 159
Upvotes: 2
Reputation: 39174
A solution using dplyr
. We need to convert the matrix to data frame before using the group_by
and summarize_at
function.
library(dplyr)
mat1_sum <- mat1 %>%
as.data.frame() %>%
group_by(Month) %>%
summarize_at(vars(LON, NYC), sum)
mat1_sum
# # A tibble: 12 x 3
# Month LON NYC
# <dbl> <dbl> <dbl>
# 1 1 450 141
# 2 2 450 150
# 3 3 450 159
# 4 4 450 141
# 5 5 450 150
# 6 6 450 159
# 7 7 450 141
# 8 8 450 150
# 9 9 450 159
# 10 10 450 141
# 11 11 450 150
# 12 12 450 159
Upvotes: 2