Reputation: 5
I have found a few options from other posts but I am having trouble reproducing the code for my specific needs.
I have climate data that I would like to find the average based on season from 1910-2015 for specific locations.
Here is an example of what I need: the average PPT for CENTROID_ID c1763_1 in 1911 for Winter (Month 12 of the preceding year [1910], month 1 and 2 of year in question [1911]), Spring (months 3,4,5 in 1911), Summer (months 6,7,8 in 1911) and Fall (months 9,10,11 in 1911). This would then need to be done for all individual centroid IDs for every year. I have over 400 unique CENTROID_IDs covering years 1910-2015.
I envision the new dataframe to have the columns CENTROID_ID, YEAR, WINT_PPT, SPR_PPT, SUM_PPT, FALL_PPT.
CENTROID_ID YEAR MONTH PPT
1 c1763_1 1910 1 52.639
2 c1763_1 1910 2 20.870
3 c1763_1 1910 3 21.706
4 c1763_1 1910 4 9.347
5 c1763_1 1910 5 1.201
6 c1763_1 1910 6 11.267
7 c1763_1 1910 7 41.870
8 c1763_1 1910 8 61.260
9 c1763_1 1910 9 27.815
10 c1763_1 1910 10 67.377
11 c1763_1 1910 11 24.719
12 c1763_1 1910 12 30.212
13 c1763_1 1911 1 88.728
14 c1763_1 1911 2 50.035
15 c1763_1 1911 3 37.720
16 c1763_1 1911 4 12.831
17 c1763_1 1911 5 0.739
18 c1763_1 1911 6 18.198
19 c1763_1 1911 7 74.731
20 c1763_1 1911 8 40.873
21 c1763_1 1911 9 86.340
22 c1763_1 1911 10 36.423
23 c1763_1 1911 11 12.491
24 c1763_1 1911 12 19.428
25 c1763_1 1912 1 11.010
26 c1763_1 1912 2 16.339
27 c1763_1 1912 3 72.017
28 c1763_1 1912 4 25.887
29 c1763_1 1912 5 5.314
30 c1763_1 1912 6 8.595
31 c1763_1 1912 7 47.781
32 c1763_1 1912 8 51.188
33 c1763_1 1912 9 10.931
34 c1763_1 1912 10 119.725
35 c1763_1 1912 11 10.420
36 c1763_1 1912 12 8.777
37 c1763_1 1913 1 27.771
38 c1763_1 1913 2 62.622
39 c1763_1 1913 3 17.533
40 c1763_1 1913 4 8.008
41 c1763_1 1913 5 1.423
42 c1763_1 1913 6 3.773
43 c1763_1 1913 7 42.982
44 c1763_1 1913 8 40.541
45 c1763_1 1913 9 58.495
46 c1763_1 1913 10 22.729
47 c1763_1 1913 11 48.130
48 c1763_1 1913 12 32.049
49 c1763_1 1914 1 104.197
50 c1763_1 1914 2 31.707
Upvotes: 0
Views: 107
Reputation: 1495
Is this it?
library(dplyr)
library(tibble)
# make fake data
dates <- expand.grid(1910:1950, 1:12)
dates <- dates[order(dates$Var1), ]
data <- tibble(
CENTROID_ID = rep("c1763_1", 240),
YEAR = dates$Var1[1:240],
MONTH = dates$Var2[1:240],
PPT = runif(min = 1, max = 100, n = 240)
)
And then we can determine the SEASON
based on YEAR
and MONTH
, group and calculate the mean per group:
data <- data %>%
mutate(SEASON = case_when(
MONTH == 12 | MONTH == 1 | MONTH == 2 ~ "WINTER",
MONTH == 3 | MONTH == 4 | MONTH == 5 ~ "SPRING",
MONTH == 6 | MONTH == 7 | MONTH == 8 ~ "SUMMER",
MONTH == 9 | MONTH == 10 | MONTH == 11 ~ "AUTUMN",
)) %>%
group_by(CENTROID_ID, YEAR, SEASON) %>%
summarise(PPT_AVG = mean(PPT))
With this result:
> data
# A tibble: 80 x 4
# Groups: CENTROID_ID, YEAR [20]
CENTROID_ID YEAR SEASON PPT_AVG
<chr> <int> <chr> <dbl>
1 c1763_1 1910 AUTUMN 35.7
2 c1763_1 1910 SPRING 44.3
3 c1763_1 1910 SUMMER 63.3
4 c1763_1 1910 WINTER 37.1
5 c1763_1 1911 AUTUMN 40.7
6 c1763_1 1911 SPRING 52.3
7 c1763_1 1911 SUMMER 36.7
8 c1763_1 1911 WINTER 10.7
9 c1763_1 1912 AUTUMN 45.4
10 c1763_1 1912 SPRING 45.7
# ... with 70 more rows
UPDATE
To shift the year we need to introduce lead(YEAR, 1)
. Updated the code and included SEASON
as a factor with defined levels such that it sorts in a decent chronological order.
data <- data %>%
mutate(SEASON = case_when(
MONTH == 12 | MONTH == 1 |MONTH == 2 ~ "WINTER",
MONTH == 3 | MONTH == 4 | MONTH == 5 ~ "SPRING",
MONTH == 6 | MONTH == 7 | MONTH == 8 ~ "SUMMER",
MONTH == 9 | MONTH == 10 | MONTH == 11~ "AUTUMN",
)) %>%
mutate(SEASON = factor(SEASON, levels = c("WINTER", "SPRING", "SUMMER", "AUTUMN"))) %>%
mutate(YEAR_LEAD = lead(YEAR, 1)) %>%
group_by(CENTROID_ID, YEAR_LEAD, SEASON) %>%
summarise(PPT_AVG = mean(PPT),
PPT_MIN = min(PPT),
PPT_MAX = max(PPT))
With this result:
> data
# A tibble: 81 x 6
# Groups: CENTROID_ID, YEAR_LEAD [21]
CENTROID_ID YEAR_LEAD SEASON PPT_AVG PPT_MIN PPT_MAX
<chr> <int> <fct> <dbl> <dbl> <dbl>
1 c1763_1 1910 WINTER 83.5 81.4 85.7
2 c1763_1 1910 SPRING 72.3 52.7 96.0
3 c1763_1 1910 SUMMER 49.9 10.9 90.0
4 c1763_1 1910 AUTUMN 26.4 7.17 63.1
5 c1763_1 1911 WINTER 60.9 19.0 92.6
6 c1763_1 1911 SPRING 62.9 58.6 67.4
7 c1763_1 1911 SUMMER 49.2 23.7 76.4
8 c1763_1 1911 AUTUMN 43.9 15.1 84.4
9 c1763_1 1912 WINTER 38.5 18.4 67.9
10 c1763_1 1912 SPRING 72.1 53.4 93.9
# ... with 71 more rows
Upvotes: 1
Reputation: 107652
Simply assign a SEASON
column then aggregate
:
df <- within(df, {
SEASON <- ifelse(MONTH %in% c(12, 1, 2), 'Winter',
ifelse(MONTH %in% c(3, 4, 5), 'Spring',
ifelse (MONTH %in% c(6, 7, 8), 'Summer',
ifelse(MONTH %in% c(9, 10, 11), 'Fall', NA)
)
)
)
YEAR <- ifelse(MONTH == 12, YEAR + 1, YEAR)
})
agg_df <- aggregate(PPT ~ CENTROID_ID + SEASON, df, FUN=mean)
agg_df
# CENTROID_ID SEASON PPT
# 1 c1763_1 Fall 43.79958
# 2 c1763_1 Spring 17.81050
# 3 c1763_1 Summer 36.92158
# 4 c1763_1 Winter 39.74171
Should you need multiple aggregations of PPT
:
agg_raw <- aggregate(PPT ~ CENTROID_ID + SEASON, df,
FUN=function(x) c(min=min(x), mean=mean(x), max=max(x)))
agg_df <- do.call(data.frame, agg_raw)
agg_df
# CENTROID_ID SEASON PPT.min PPT.mean PPT.max
# 1 c1763_1 Fall 10.420 43.79958 119.725
# 2 c1763_1 Spring 0.739 17.81050 72.017
# 3 c1763_1 Summer 3.773 36.92158 74.731
# 4 c1763_1 Winter 8.777 39.74171 104.197
Upvotes: 0
Reputation: 39667
You can create an auxiliary column of x$YEAR + x$MONTH/12
which helps bringing December for last year to current year. And then simply use aggregate
of PPT
over CENTROID_ID
, floor(aux)
and floor(aux %% 1 * 4)
. Then reshape
to bring the seasons in the same row.
x$aux <- x$YEAR + x$MONTH/12
y <- aggregate(PPT ~ CENTROID_ID + cbind(YEAR=floor(aux)) + cbind(SEASON=c("WINT",
"SPR", "SUM", "FALL")[1+floor(aux %% 1 * 4)]), x, mean)
reshape(do.call(data.frame, y), v.names = "PPT", timevar = "SEASON", idvar =
c("CENTROID_ID", "YEAR"), direction = "wide")
# CENTROID_ID YEAR PPT.FALL PPT.SPR PPT.SUM PPT.WINT
#1 c1763_1 1910 39.97033 10.75133 38.13233 36.75450
#2 c1763_1 1911 45.08467 17.09667 44.60067 56.32500
#3 c1763_1 1912 47.02533 34.40600 35.85467 15.59233
#4 c1763_1 1913 43.11800 8.98800 29.09867 33.05667
#17 c1763_1 1914 NA NA NA 55.98433
Upvotes: 1