bolleke
bolleke

Reputation: 151

ifelse statement only returns else value (same with dplyr's if_else, loop)

The answer is probably not too far fetched and I apologize for that in advance. I'm doing a basic web scraping exercise, based of code i find online but with my own twist in order for me to know what I'm writing. I managed to add a year, data & president column but I'm struggling to add the US president's party to my df. the result is always the same, all presidents are labelled as Republicans.

Here's my code

library(rvest)
library(tidyr)
library(dplyr)

pres.library <- read_html(x = "http://stateoftheunion.onetwothree.net/texts/index.html")

links <- pres.library %>%
  html_nodes("#text li a") %>%
  html_attr("href")

text <- pres.library %>%
  html_nodes("#text li a") %>%
  html_text()

sotu <- data.frame (text = text, links = links, stringsAsFactors = F) %>%
  separate(text, c("President", "Date", "Year"), ",")

sotu.modern <- sotu[-c(1:156),]

democrats <- c("Harry S. Truman", "John F. Kennedy", "Lyndon B. Johnson", "Jimmy Carter", "William J. Clinton", "Barack Obama")

and here's the ifelse statement.

sotu.modern$Party <- ifelse(sotu.modern$President %in% democrats, "Democrats", "Republican")

I tried with dplyer's if_else function & with a classic if{} else {} loop/function but the result is always the same.

Thanks in advance

Upvotes: 0

Views: 121

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145755

I ran your code, and looked at unique(sotu.modern$President) and it looks like every name has a leading space, and Obama has a trailing space:

> unique(sotu$President)
 [1] " George Washington"     " John Adams"            " Thomas Jefferson"      " James Madison"        
 [5] " James Monroe"          " John Quincy Adams"     " Andrew Jackson"        " Martin van Buren"     
 [9] " John Tyler"            " James Polk"            " Zachary Taylor"        " Millard Fillmore"     
[13] " Franklin Pierce"       " James Buchanan"        " Abraham Lincoln"       " Andrew Johnson"       
[17] " Ulysses S. Grant"      " Rutherford B. Hayes"   " Chester A. Arthur"     " Grover Cleveland"     
[21] " Benjamin Harrison"     " William McKinley"      " Theodore Roosevelt"    " William H. Taft"
...  

Use the base function trimws() or stringr::str_trim to remove it.

sotu$President = trimws(sotu$President)
sotu.modern <- sotu[-c(1:156),]
sotu.modern$Party <- ifelse(sotu.modern$President %in% democrats, "Democrats", "Republican")
sotu.modern
sotu.modern
#                President         Date Year         links      Party
# 157      Harry S. Truman   January 21 1946 19460121.html  Democrats
# 158      Harry S. Truman    January 6 1947 19470106.html  Democrats
# 159      Harry S. Truman    January 7 1948 19480107.html  Democrats
# 160      Harry S. Truman    January 5 1949 19490105.html  Democrats
# 161      Harry S. Truman    January 4 1950 19500104.html  Democrats
# 162      Harry S. Truman    January 8 1951 19510108.html  Democrats
# 163      Harry S. Truman    January 9 1952 19520109.html  Democrats
# 164      Harry S. Truman    January 7 1953 19530107.html  Democrats
# 165 Dwight D. Eisenhower   February 2 1953 19530202.html Republican
# 166 Dwight D. Eisenhower    January 7 1954 19540107.html Republican
...

Upvotes: 2

Related Questions