lena_sr
lena_sr

Reputation: 33

Is there a better way to scrape a wikipedia page in R?

I am working with a data set which contains the states in the US and now tried to scrape the wikipedia page "List of United States governors" to distinguish Democratic and Republican States.

My code looks like this so far:

library(tidyverse)
library(dplyr)
library(tidyr)
library(readr)
library(rvest)
library(htmltab)
library(lubridate)

corona_usa_simple <- readr::read_csv("https://raw.githubusercontent.com/datasets/covid-19/master/data/us_simplified.csv")

corona_us_states <- corona_usa_simple %>% 
select(- FIPS, - Admin2, -`Country/Region`) %>%  rename(State=`Province/State`)

wiki_govenors <- htmltab("https://en.wikipedia.org/wiki/List_of_United_States_governors") %>% rename(State=`Democratic(24)  Republican(26) >> State`)

So now before I merge the data sets I wanted to rename the first Column so that it says "State" in both sets. But somehow I get an error that says: "Can't rename columns that don't exist." Is there maybe a better way to scrape the wiki page so that not every column starts with "`Democratic(24) Republican(26)" ?

Upvotes: 0

Views: 162

Answers (1)

juljo
juljo

Reputation: 674

You can specify the header column in the htmltab() call. This names the columns correctly but includes "Democratic(24) Republican(26)" in the first row. To remove it use slice(-1) from dplyr.

wiki_govenors <- htmltab("https://en.wikipedia.org/wiki/List_of_United_States_governors",
 header = 2) %>% slice(-1)

The resulting data:

head(wiki_governors)

       State       Governor Party    Party.1                       Born
1    Alabama       Kay Ivey      Republican October 15, 1944 (age 75)
2     Alaska  Mike Dunleavy      Republican      May 5, 1961 (age 59)
3    Arizona     Doug Ducey      Republican    April 9, 1964 (age 56)
4   Arkansas Asa Hutchinson      Republican December 3, 1950 (age 69)
5 California   Gavin Newsom      Democratic October 10, 1967 (age 52)
6   Colorado    Jared Polis      Democratic     May 12, 1975 (age 45)
                                                                                                                                     Prior public experience
1                                                                                                                             Lieutenant Governor, Treasurer
2                                                                                                                                              Alaska Senate
3                                                                                                                                                  Treasurer
4 Under Secretary of Homeland Security for Border & Transportation Security, Administrator of the Drug Enforcement Administration, U.S. House, U.S. Attorney
5                                                                                                                Lieutenant Governor, Mayor of San Francisco
6                                                                                                              U.S. House, Colorado State Board of Education
      Inauguration        End of term Past governors
1   April 10, 2017               2023           List
2 December 3, 2018               2022           List
3  January 5, 2015 2023 (term limits)           List
4 January 13, 2015 2023 (term limits)           List
5  January 7, 2019               2023           List
6  January 8, 2019               2023           List

Upvotes: 1

Related Questions