Reputation: 1278
I was wondering if anyone had useful ideas or code for web scraping tables from Wikipedia.
Specifically, I'm interested in the Presidential election results table in the "Results by county" section on Wikipedia.
An example table can be found using the following link and scrolling down to the "Results by county" section: https://en.wikipedia.org/wiki/1948_United_States_presidential_election_in_Texas
I've tried some solutions from the following StackOverflow post: Importing wikipedia tables in R
However, they don't appear to be appliable to the type of table I want to scrape in Wikipedia.
Any advice, solutions, or code would be greatly appreciated. Thank you!
Upvotes: 1
Views: 474
Reputation: 124048
Making use of the rvest
package you could get the table by first selecting the element containing the desired table via html_element("table.wikitable.sortable")
and then extracting the table via html_table()
like so:
library(rvest)
url <- "https://en.wikipedia.org/wiki/1948_United_States_presidential_election_in_Texas"
html <- read_html(url)
county_table <- html %>%
html_element("table.wikitable.sortable") %>%
html_table()
head(county_table)
#> # A tibble: 6 x 14
#> County `Harry S. Truman… `Harry S. Truman… `Thomas E. Dewey… `Thomas E. Dewe…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 County # % # %
#> 2 Anders… 3,242 62.37% 1,199 23.07%
#> 3 Andrews 816 85.27% 101 10.55%
#> 4 Angeli… 4,377 69.05% 1,000 15.78%
#> 5 Aransas 418 61.02% 235 34.31%
#> 6 Archer 1,599 86.20% 191 10.30%
#> # … with 9 more variables: Strom ThurmondStates’ Rights Democratic <chr>,
#> # Strom ThurmondStates’ Rights Democratic.1 <chr>,
#> # Henry A. WallaceProgressive <chr>, Henry A. WallaceProgressive.1 <chr>,
#> # Various candidatesOther parties <chr>,
#> # Various candidatesOther parties.1 <chr>, Margin <chr>, Margin.1 <chr>,
#> # Total votes cast[11] <chr>
Upvotes: 5