crich
crich

Reputation: 99

How to remove multiple year-months from a dataset

I have this data set but only want months 1, 2, 3, 12 and want all the years associated with these months only. The format of the date is year-month and I need to keep it that way for eventually merge with another data set. Thank you for your help

# write the webscraper
library(XML)
library(RCurl)
library(dplyr)
library('zoo')
library('tidyverse')
library('lubridate')
avalanche<-data.frame()
avalanche.url<-"https://utahavalanchecenter.org/observations?page="
all.pages<-0:202
for(page in all.pages){
  this.url<-paste(avalanche.url, page, sep="")
  this.webpage<-htmlParse(getURL(this.url))
  thispage.avalanche<-readHTMLTable(this.webpage, which=1, header=T,stringsAsFactors=F)
  names(thispage.avalanche)<-c('Date','Region','Location','Observer')
  avalanche<-rbind(avalanche,thispage.avalanche)
}

# subset the data to the Salt Lake Region
avalancheslc<-subset(avalanche, Region=="Salt Lake")
str(avalancheslc)

# convert the dates and total the number of avalanches
avalancheslc <- avalancheslc %>% 
          group_by(Date = format(as.yearmon(Date, "%m/%d/%Y"), "%Y-%m")) %>% 
          summarise(AvalancheTotal = n())
# pipe to only include Dec-Mar of each year
avalancheslc <- avalancheslc %>% filter(as.integer(substr(Date, 6, 7)) %in% c(12, 1:3))




avalancheslc <- avalancheslc %>% mutate(Date = parse_date_time(Date, "%y-%m"))


# A full data frame of months
all_months <- avalancheslc %>% expand(Date = seq(first(Date), last(Date), by = "month"))

# Join to `avalanches` and fill in with 0s
avalancheslc <- avalancheslc %>% right_join(all_months) %>% replace_na(list(AvalancheTotal = 0))

# convert date back to Year-Month format
avalancheslc$Date<-format(avalancheslc$Date, "%Y-%m")


should look something like this

Date    AvalancheTotal
1980-01         1
1980-02         0
1980-03         0
1980-12         0
1981-01         0
1981-02         1
..
.
.
.
2019-03        163

Upvotes: 0

Views: 62

Answers (1)

Croote
Croote

Reputation: 1424

you can use the lubridate package to achieve something like this. Using month, which extracts the integer value of the month, and filtering on your requirements, hence,

library(dplyr)
library(lubridate)
df %>%
  filter(month(date) %in% c(1, 2, 3, 12))

Upvotes: 1

Related Questions