Reputation: 151
The answer is probably not too far fetched and I apologize for that in advance. I'm doing a basic web scraping exercise, based of code i find online but with my own twist in order for me to know what I'm writing. I managed to add a year, data & president column but I'm struggling to add the US president's party to my df. the result is always the same, all presidents are labelled as Republicans.
Here's my code
library(rvest)
library(tidyr)
library(dplyr)
pres.library <- read_html(x = "http://stateoftheunion.onetwothree.net/texts/index.html")
links <- pres.library %>%
html_nodes("#text li a") %>%
html_attr("href")
text <- pres.library %>%
html_nodes("#text li a") %>%
html_text()
sotu <- data.frame (text = text, links = links, stringsAsFactors = F) %>%
separate(text, c("President", "Date", "Year"), ",")
sotu.modern <- sotu[-c(1:156),]
democrats <- c("Harry S. Truman", "John F. Kennedy", "Lyndon B. Johnson", "Jimmy Carter", "William J. Clinton", "Barack Obama")
and here's the ifelse statement.
sotu.modern$Party <- ifelse(sotu.modern$President %in% democrats, "Democrats", "Republican")
I tried with dplyer's if_else function & with a classic if{} else {} loop/function but the result is always the same.
Thanks in advance
Upvotes: 0
Views: 121
Reputation: 145755
I ran your code, and looked at unique(sotu.modern$President)
and it looks like every name has a leading space, and Obama has a trailing space:
> unique(sotu$President)
[1] " George Washington" " John Adams" " Thomas Jefferson" " James Madison"
[5] " James Monroe" " John Quincy Adams" " Andrew Jackson" " Martin van Buren"
[9] " John Tyler" " James Polk" " Zachary Taylor" " Millard Fillmore"
[13] " Franklin Pierce" " James Buchanan" " Abraham Lincoln" " Andrew Johnson"
[17] " Ulysses S. Grant" " Rutherford B. Hayes" " Chester A. Arthur" " Grover Cleveland"
[21] " Benjamin Harrison" " William McKinley" " Theodore Roosevelt" " William H. Taft"
...
Use the base function trimws()
or stringr::str_trim
to remove it.
sotu$President = trimws(sotu$President)
sotu.modern <- sotu[-c(1:156),]
sotu.modern$Party <- ifelse(sotu.modern$President %in% democrats, "Democrats", "Republican")
sotu.modern
sotu.modern
# President Date Year links Party
# 157 Harry S. Truman January 21 1946 19460121.html Democrats
# 158 Harry S. Truman January 6 1947 19470106.html Democrats
# 159 Harry S. Truman January 7 1948 19480107.html Democrats
# 160 Harry S. Truman January 5 1949 19490105.html Democrats
# 161 Harry S. Truman January 4 1950 19500104.html Democrats
# 162 Harry S. Truman January 8 1951 19510108.html Democrats
# 163 Harry S. Truman January 9 1952 19520109.html Democrats
# 164 Harry S. Truman January 7 1953 19530107.html Democrats
# 165 Dwight D. Eisenhower February 2 1953 19530202.html Republican
# 166 Dwight D. Eisenhower January 7 1954 19540107.html Republican
...
Upvotes: 2