Reputation: 1

R - Filter data by month

I apologize for my bad English, but I really need your help.

I have a .csv dataset with two columns - year and value. There is data about height of precipitation monthly from 1900 to 2019.

It looks like this:

year    value
190001  100
190002  39
190003  78
190004  45
...
201912  25

I need to create two new datasets: the first one with the data for every year from July (07) to September (09) and the second one from January (01) to March (03).

Also I need to summarize this data for every year (it means I need only one value per year).

So I have data for summer 1900-2019 and winter 1900-2019.

Upvotes: 0

Answers (2)

Phil

Reputation: 8127

library(tidyverse)

dat <- tribble(
  ~year,    ~value,
  190001,  100,
  190002,  39,
  190003,  78,
  190004,  45)

Splitting the year variable into a month and year variable:

dat_prep <- dat %>% 
  mutate(month = str_remove(year, "^\\d{4}"), # Remove the first 4 digits
         year = str_remove(year, "\\d{2}$"), # Remove the last 2 digits
         across(everything(), as.numeric))

dat_prep %>% 
  filter(month %in% 7:9) %>% # For months Jul-Sep. Repeat with 1:3 for Jan-Mar
  group_by(year) %>% 
  summarize(value = sum(value))

Upvotes: 0

slava-kohut

Reputation: 4233

You can use the dplyr and stringr packages to achive what you need. I created a mock data set first:

library(dplyr)
library(stringr)

df <- data.frame(time = 190001:201219, value=runif(length(190001:201219), 0, 100))

After that, we create two separate columns for month and year:

df$year <- as.numeric(str_extract(df$time, "^...."))
df$month <- as.numeric(str_extract(df$time, "..$"))

At this point, we can filter:

df_1 <- df %>% filter(between(month,7,9))
df_2 <- df %>% filter(between(month,1,3))

... and summarize:

df <- df %>% group_by(year) %>% summarise(value = sum(value))

Upvotes: 1

R - Filter data by month

Answers (2)

Related Questions