Reputation: 53
I am trying to create a data frame with the following variables. However, after using the SelectorGadget tool to determine the CSS selector needed to scrape this information, the vectors yield different values. Even after copying the selector straight from the HTML source code. If done correctly, this table should have 34 rows. Here it is my code and the respective error:
womens_bb <- read_html("http://gomason.com/schedule.aspx?path=wbball")
womens_opponents <- womens_bb %>%
html_nodes(".sidearm-schedule-game-opponent-name a") %>%
html_text()
womens_locations <- womens_bb %>%
html_nodes(".sidearm-schedule-game-location span:nth-child(1)") %>%
html_text()
womens_dates <- womens_bb %>%
html_nodes(".sidearm-schedule-game-opponent-date span:nth-child(1)") %>%
html_text()
womens_times <- womens_bb %>%
html_nodes(".sidearm-schedule-game-opponent-date span:nth-child(2)") %>%
html_text()
as.numeric()
womens_scores <- womens_bb %>%
html_nodes("div.sidearm-schedule-game-result span:nth-child(3)") %>%
html_text()
as.numeric()
womens_win_loss <- womens_bb %>%
html_nodes(".text-italic span:nth-child(2)") %>%
html_text() %>%
str_replace("\\,", "")
womens_df <- data_frame(
date = womens_dates, time = womens_times, opponent = womens_opponents, location = womens_locations, score = womens_scores, win_loss = womens_win_loss)
Error: Columns `date`, `time`, `opponent`, `score`, `win_loss` must be length 1 or 37, not 36, 36, 34, 34, 35
How can I resolve this issue?
Upvotes: 1
Views: 65
Reputation: 496
I think there are some issue with the img tag. So to avoid, these you can first gather the global div tag ( 36 when i do the script), and do a loop inside to get your result. And perform a little if controle on tag that appear weirds :
womens_bb <- read_html("http://gomason.com/schedule.aspx?path=wbball")
divs <- womens_bb %>% html_nodes(".sidearm-schedule-game")
for (div in divs){
womens_opponents <- div %>%
html_nodes(".sidearm-schedule-game-opponent-name, .sidearm-schedule-game-opponent-name a") %>%
html_text
womens_opponents <- gsub("\\s{2,}","",womens_opponents[1])
womens_locations <- div %>%
html_nodes(".sidearm-schedule-game-location span:nth-child(1)") %>%
html_text()
womens_locations <- womens_locations[1]
womens_dates <- div %>%
html_nodes(".sidearm-schedule-game-opponent-date span:nth-child(1)") %>%
html_text()
womens_times <- div %>%
html_nodes(".sidearm-schedule-game-opponent-date span:nth-child(2)") %>%
html_text()
womens_scores <- div %>%
html_nodes("div.sidearm-schedule-game-result span:nth-child(3)") %>%
html_text()
if(length(womens_scores)==0) womens_scores = ""
womens_win_loss <- div %>%
html_nodes(".text-italic span:nth-child(2)") %>%
html_text()
womens_win_loss <- gsub("\\,", "",womens_win_loss)
res <- c(date = womens_dates, time = womens_times, opponent = womens_opponents, location = womens_locations, score = womens_scores, win_loss = womens_win_loss)
print(length(res))
df <- rbind(df,res)
}
Hope that will helps,
Gottavianoni
Upvotes: 1