k3r0
k3r0

Reputation: 377

Structuring climate data in R

I am very new to programing and recently started experimenting with R for data analysis purposes. I am currently trying to generate a new column on my df based on values from another column and add up the rainfall total for each. I obtained my climate data from the PRISM climate group site and have added the following code to separate the date field obtained (1980-01) into years and months

climate <- tidyr::separate(climate,date, c("year", "month"), sep = "-") 

My question is how can I go about adding a new column which adds text based on the month?

My current pseudocode approach is

if climate$month == 1,2,3 then climate$season == winter

else climate$month == 4,5,6 then climate$season == spring

else climate$month == 7,8,9 then climate$season == summer

else climate$month == 10,11,12 climate$season == fall

My goal is to generate a new df with the calculated sum for each season's rainfall of each year while avoiding the use of Excel

Thanks for the advice!

Solved, here is the final working output for future refrence:

#Read in PRSIM data
prism <- read.csv('PRISM.csv')

#Seperate Date into Year - Month
prism <- tidyr::separate(prism,date, c("year", "month"), sep = "-")

#Convert factor variable into numeric
library(dplyr)
prism <- prism %>% mutate(month= as.numeric(as.character(month)))

#Generate new season column based on month
prism <- prism %>% mutate(season = case_when(
  month < 4 ~ "winter",
  month < 7 ~ "spring",
  month < 10 ~ "summer",
  month < 13 ~ "fall",
  TRUE ~ NA_character_
))

#Generate new data frame with Year and sum of each seasons value
clima <-  prism %>% 
          group_by(year, season) %>% 
          summarise(ppt_mm = sum(ppt_mm), tmin_c = sum(tmin_c), tmean_c = sum(tmean_c), tmax_c = sum(tmax_c), vdpmin_hpa = sum(vdpmin_hpa), vdpmax_hpa = sum(vdpmax_hpa))

#By Season
spring <- clima[clima$season=="spring", ]
summer <- clima[clima$season=="summer", ] 
fall <- clima[clima$season=="fall", ] 
winter <- clima[clima$season=="winter", ]

Upvotes: 1

Views: 259

Answers (2)

Seth McGinnis
Seth McGinnis

Reputation: 11

In a vector-oriented language like R, instead of writing code that evaluates a lot of conditions to assign values (i.e., switch or cascading if statements), you can usually dramatically simplify things by creating a vector that encodes the mapping and indexing it with the original data.

For your problem, you want to map the values 1:12 to the 4 seasons. So you create a vector with 12 elements, each of which is the season value for the corresponding month. Then you index that vector by the values of the "month" column in your dataframe, and that gives you seasons:

m2s <- c("DJF","DJF","MAM","MAM","MAM","JJA","JJA","JJA","SON","SON","SON","DJF")
prism$season <- m2s[prism$month]

In this case, we're using the numeric month value as a numeric index into the vector, but if you had a vector of month names or factor levels instead of numbers, it will still work as long as your mapping vector is named:

> m2s <- rep(each=3, c("DJF","MAM","JJA","SON"))[c(2:12,1)]
> names(m2s) <- month.abb
> print(m2s)
  Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec 
"DJF" "DJF" "MAM" "MAM" "MAM" "JJA" "JJA" "JJA" "SON" "SON" "SON" "DJF" 
 
> test <- factor(sample(month.abb, 8, rep=TRUE))
> print(test)
[1] May Apr Jul Sep Dec Jun Jul May
Levels: Apr Dec Jul Jun May Sep

> print(m2s[test])
  May   Jan   Mar   Jun   Feb   Apr   Mar   May 
"MAM" "DJF" "MAM" "JJA" "DJF" "MAM" "MAM" "MAM" 

Upvotes: 1

linog
linog

Reputation: 6226

You can use dplyr::case_when. It's better than chaining conditions:

library(dplyr)

df %>% mutate(season = case_when(
  month < 3 ~ "winter",
  month < 7 ~ "spring",
  month < 10 ~ "summer",
  month < 13 ~ "fall",
  TRUE ~ NA_character_
))

month season
1      1 winter
2      2 winter
3      3 spring
4      4 spring
5      5 spring
6      6 spring
7      7 summer
8      8 summer
9      9 summer
10    10   fall
11    11   fall
12    12   fall
13    13   <NA>

Upvotes: 0

Related Questions