Reputation: 313
after running this code in some rows (34:39) were introduced NAs and I do not know why? Could you help? I tried another pc, however the same problem occured.
# CZECH REPO SAZBA
library(rvest)
library(dplyr)
link <- "https://www.cnb.cz/cs/casto-kladene-dotazy/Jak-se-vyvijela-dvoutydenni-repo-sazba-CNB/"
page <- read_html(link)
date <- page %>% html_nodes('td:nth-child(1)') %>% html_text()
repo <- page %>% html_nodes('td+ td') %>% html_text()
Sazba <- data.frame(cbind(date, repo))
Sazba$repo <- as.numeric(gsub(",", ".", Sazba$repo))
Sazba$date <- gsub(" ", "", Sazba$date)
str(Sazba)
Sazba$date <- as.Date(gsub("[.]", "/", Sazba$date), "%d/%m/%Y")
Picture of the issue in R-Studio
Upvotes: 1
Views: 52
Reputation: 2949
Try using parsedate package:
library(parsedate)
Sazba$date <- parse_date(Sazba$date)
Upvotes: 0
Reputation: 4184
I have a solution, the problem was with different encoding for these date values. I use stringi package to format them so these strange spaces were translated to "\u00a0".
#CZECH REPO SAZBA
library(rvest)
library(dplyr)
link <- "https://www.cnb.cz/cs/casto-kladene-dotazy/Jak-se-vyvijela-dvoutydenni-repo-sazba-CNB/"
page <- read_html(link)
date <- page %>% html_nodes('td:nth-child(1)') %>% html_text()
repo <- page %>% html_nodes('td+ td') %>% html_text()
date <- stringi::stri_escape_unicode(date)
Sazba <- data.frame(cbind(date, repo))
Sazba$repo <- as.numeric(gsub(",", ".", Sazba$repo))
Sazba$date <- gsub("\\\\u00a0", "", Sazba$date)
str(Sazba)
Sazba$date <- as.Date(Sazba$date, "%d.%m.%Y")
Sazba
Upvotes: 2