Function as.Date returns NAs only in some rows

Question

after running this code in some rows (34:39) were introduced NAs and I do not know why? Could you help? I tried another pc, however the same problem occured.

# CZECH REPO SAZBA
library(rvest)
library(dplyr)

link <- "https://www.cnb.cz/cs/casto-kladene-dotazy/Jak-se-vyvijela-dvoutydenni-repo-sazba-CNB/"
page <- read_html(link)

date <- page %>% html_nodes('td:nth-child(1)') %>% html_text()
repo <- page %>% html_nodes('td+ td') %>% html_text()

Sazba <- data.frame(cbind(date, repo))
Sazba$repo <- as.numeric(gsub(",", ".", Sazba$repo))
Sazba$date <- gsub(" ", "", Sazba$date)
str(Sazba)
Sazba$date <- as.Date(gsub("[.]", "/", Sazba$date), "%d/%m/%Y")

Picture of the issue in R-Studio

polkas · Accepted Answer

I have a solution, the problem was with different encoding for these date values. I use stringi package to format them so these strange spaces were translated to "\u00a0".

#CZECH REPO SAZBA
library(rvest)
library(dplyr)

link <- "https://www.cnb.cz/cs/casto-kladene-dotazy/Jak-se-vyvijela-dvoutydenni-repo-sazba-CNB/"
page <- read_html(link)

date <- page %>% html_nodes('td:nth-child(1)') %>% html_text()
repo <- page %>% html_nodes('td+ td') %>% html_text()

date <- stringi::stri_escape_unicode(date)

Sazba <- data.frame(cbind(date, repo))
Sazba$repo <- as.numeric(gsub(",", ".", Sazba$repo))
Sazba$date <- gsub("\\u00a0", "", Sazba$date)
str(Sazba)
Sazba$date <- as.Date(Sazba$date, "%d.%m.%Y")

Sazba

Function as.Date returns NAs only in some rows

Answers (2)

Related Questions