Sharif Amlani
Sharif Amlani

Reputation: 1278

Webscraping Tables From Wikipedia in R

I was wondering if anyone had useful ideas or code for web scraping tables from Wikipedia.

Specifically, I'm interested in the Presidential election results table in the "Results by county" section on Wikipedia.

An example table can be found using the following link and scrolling down to the "Results by county" section: https://en.wikipedia.org/wiki/1948_United_States_presidential_election_in_Texas

The table looks like this: enter image description here

I've tried some solutions from the following StackOverflow post: Importing wikipedia tables in R

However, they don't appear to be appliable to the type of table I want to scrape in Wikipedia.

Any advice, solutions, or code would be greatly appreciated. Thank you!

Upvotes: 1

Views: 474

Answers (1)

stefan
stefan

Reputation: 124048

Making use of the rvest package you could get the table by first selecting the element containing the desired table via html_element("table.wikitable.sortable") and then extracting the table via html_table() like so:

library(rvest)

url <- "https://en.wikipedia.org/wiki/1948_United_States_presidential_election_in_Texas"

html <- read_html(url)

county_table <- html %>% 
  html_element("table.wikitable.sortable") %>% 
  html_table()

head(county_table)
#> # A tibble: 6 x 14
#>   County  `Harry S. Truman… `Harry S. Truman… `Thomas E. Dewey… `Thomas E. Dewe…
#>   <chr>   <chr>             <chr>             <chr>             <chr>           
#> 1 County  #                 %                 #                 %               
#> 2 Anders… 3,242             62.37%            1,199             23.07%          
#> 3 Andrews 816               85.27%            101               10.55%          
#> 4 Angeli… 4,377             69.05%            1,000             15.78%          
#> 5 Aransas 418               61.02%            235               34.31%          
#> 6 Archer  1,599             86.20%            191               10.30%          
#> # … with 9 more variables: Strom ThurmondStates’ Rights Democratic <chr>,
#> #   Strom ThurmondStates’ Rights Democratic.1 <chr>,
#> #   Henry A. WallaceProgressive <chr>, Henry A. WallaceProgressive.1 <chr>,
#> #   Various candidatesOther parties <chr>,
#> #   Various candidatesOther parties.1 <chr>, Margin <chr>, Margin.1 <chr>,
#> #   Total votes cast[11] <chr>

Upvotes: 5

Related Questions