Scrape multiple tables from Wikipedia in R

Question

I am trying to scrape content of this Wiki Page using rvest library in R

(https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2019)

I want to extract 4 tables which contains data wrt release of bollywood films in 2019 (January–March,April–June, July–September,October–December)

Already done

library(rvest)
url <- "https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2019"
webpage <- read_html(url)
tbls <- html_nodes(webpage, "table")

#Then I match with the word opening & I get 4 tables as in wikipedia page, however I am struggling to combine them into one dataframe & store it 

tbls[grep("Opening",tbls,ignore.case = T)]

This Gives error

df <- html_table(tbls[grep("Opening",tbls,ignore.case = T)],fill = T)

I understand because it returned multiple tables, I am missing something subscript somewhere not sure where. Help !

Mislav · Accepted Answer

For complicated HTML tables, I recommend htmltab package:

library(purrr)
library(htmltab)

url <- "https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2019"
tbls <- map2(url, 4:7, htmltab)
tbls <- do.call(rbind, tbls)

Scrape multiple tables from Wikipedia in R

This Gives error

Answers (2)

Related Questions