purple1437
purple1437

Reputation: 313

How to loop through year and month in a R function

I have a web scraping function which takes the year(term) and month(term2) and returns the news headlines as a dataframe:

news <- function(term, term2) {
  
  html_dat <- read_html(paste0("https://news.google.com/search?q=site%3%2F",term,"%2F",term2,"&hl=en-US&gl=US&ceid=US%3Aen"))

  news_dat <- data.frame(
    Title = html_dat %>%
      html_nodes("a.DY5T1d") %>% 
      html_text()
  ) 

return(news_dat)
}

df <- news('2020', '05')

I would like to create a loop which where it takes the argument from Year 2000 to 2021 and for each month. For example the loop will take in argument news('2000', '01') then iterate to news('2000', '02').

I would like to return the dataframe/list for all the headlines in above time line. The code I have which does not work:


years <- 2000:2021
months <- 1:12

for (i in length(years)){
  for (j in length(months)){
    temp <- news(i,j)
  }
  newdf <- rbind(newdf, temp)
}

Upvotes: 0

Views: 1315

Answers (2)

norie
norie

Reputation: 9857

You could use map2_dfr from the purrr package for this.

library(textreadr)
library(purrr)
library(rvest)

news <- function(term, term2) {

  url <-paste0("https://news.google.com/search?q=site%3%2F",term,"%2F",term2,"&hl=en-US&gl=US&ceid=US%3Aen")
  html_dat <- read_html(url)

  news_dat <- data.frame(
    Title = html_dat %>%
      html_nodes("a.DY5T1d") %>%
      html_text()
  )

}

years <- 2000:2021
months <- 1:12

crossArg <-cross_df(list(year=years, month=months))

df <- map2_dfr(crossArg$year, crossArg$month, news)

Upvotes: 1

Limey
Limey

Reputation: 12461

The important thing to remember is that R is designed to work on columns. For example, to add 1 to every element of a vector, it's sufficient to write

x <- 1:10
x <- x + 1

which gives a vector whose first element is 2 and last element is 11. So, when you find yourself writing code to loop through rows of a vector/matrix/data frame, stop. There is almost certainly a better way*.

*: There are rare, very rare, exceptions. This is not one of them.

library(tidyverse)

newdf <- tibble() %>% expand(year=2000:2021, month=1:12)
newdf
# A tibble: 264 x 2
    year month
   <int> <int>
 1  2000     1
 2  2000     2
 3  2000     3
 4  2000     4
 5  2000     5
 6  2000     6
 7  2000     7
 8  2000     8
 9  2000     9
10  2000    10
# … with 254 more rows

which I believe is what you want.

Edit To forestall OP's request for conversion to character, at which they hint in one of their comments:

newdf <- tibble() %>% 
           expand(year=2000:2021, month=1:12) %>% 
           mutate(year=as.character(year), month=as.character(month))

or

newdf <- tibble() %>% 
           expand(year=as.character(2000:2021), month=as.character(1:12))

Though I believe this is unnecessary.

Upvotes: 0

Related Questions