Reputation: 377
I am very new to programing and recently started experimenting with R for data analysis purposes. I am currently trying to generate a new column on my df based on values from another column and add up the rainfall total for each. I obtained my climate data from the PRISM climate group site and have added the following code to separate the date field obtained (1980-01) into years and months
climate <- tidyr::separate(climate,date, c("year", "month"), sep = "-")
My question is how can I go about adding a new column which adds text based on the month?
My current pseudocode approach is
if climate$month == 1,2,3 then climate$season == winter
else climate$month == 4,5,6 then climate$season == spring
else climate$month == 7,8,9 then climate$season == summer
else climate$month == 10,11,12 climate$season == fall
My goal is to generate a new df with the calculated sum for each season's rainfall of each year while avoiding the use of Excel
Thanks for the advice!
Solved, here is the final working output for future refrence:
#Read in PRSIM data
prism <- read.csv('PRISM.csv')
#Seperate Date into Year - Month
prism <- tidyr::separate(prism,date, c("year", "month"), sep = "-")
#Convert factor variable into numeric
library(dplyr)
prism <- prism %>% mutate(month= as.numeric(as.character(month)))
#Generate new season column based on month
prism <- prism %>% mutate(season = case_when(
month < 4 ~ "winter",
month < 7 ~ "spring",
month < 10 ~ "summer",
month < 13 ~ "fall",
TRUE ~ NA_character_
))
#Generate new data frame with Year and sum of each seasons value
clima <- prism %>%
group_by(year, season) %>%
summarise(ppt_mm = sum(ppt_mm), tmin_c = sum(tmin_c), tmean_c = sum(tmean_c), tmax_c = sum(tmax_c), vdpmin_hpa = sum(vdpmin_hpa), vdpmax_hpa = sum(vdpmax_hpa))
#By Season
spring <- clima[clima$season=="spring", ]
summer <- clima[clima$season=="summer", ]
fall <- clima[clima$season=="fall", ]
winter <- clima[clima$season=="winter", ]
Upvotes: 1
Views: 259
Reputation: 11
In a vector-oriented language like R, instead of writing code that evaluates a lot of conditions to assign values (i.e., switch
or cascading if
statements), you can usually dramatically simplify things by creating a vector that encodes the mapping and indexing it with the original data.
For your problem, you want to map the values 1:12 to the 4 seasons. So you create a vector with 12 elements, each of which is the season value for the corresponding month. Then you index that vector by the values of the "month" column in your dataframe, and that gives you seasons:
m2s <- c("DJF","DJF","MAM","MAM","MAM","JJA","JJA","JJA","SON","SON","SON","DJF")
prism$season <- m2s[prism$month]
In this case, we're using the numeric month value as a numeric index into the vector, but if you had a vector of month names or factor levels instead of numbers, it will still work as long as your mapping vector is named:
> m2s <- rep(each=3, c("DJF","MAM","JJA","SON"))[c(2:12,1)]
> names(m2s) <- month.abb
> print(m2s)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
"DJF" "DJF" "MAM" "MAM" "MAM" "JJA" "JJA" "JJA" "SON" "SON" "SON" "DJF"
> test <- factor(sample(month.abb, 8, rep=TRUE))
> print(test)
[1] May Apr Jul Sep Dec Jun Jul May
Levels: Apr Dec Jul Jun May Sep
> print(m2s[test])
May Jan Mar Jun Feb Apr Mar May
"MAM" "DJF" "MAM" "JJA" "DJF" "MAM" "MAM" "MAM"
Upvotes: 1
Reputation: 6226
You can use dplyr::case_when
. It's better than chaining conditions:
library(dplyr)
df %>% mutate(season = case_when(
month < 3 ~ "winter",
month < 7 ~ "spring",
month < 10 ~ "summer",
month < 13 ~ "fall",
TRUE ~ NA_character_
))
month season
1 1 winter
2 2 winter
3 3 spring
4 4 spring
5 5 spring
6 6 spring
7 7 summer
8 8 summer
9 9 summer
10 10 fall
11 11 fall
12 12 fall
13 13 <NA>
Upvotes: 0