Jaroslav Kotrba
Jaroslav Kotrba

Reputation: 313

Function as.Date returns NAs only in some rows

after running this code in some rows (34:39) were introduced NAs and I do not know why? Could you help? I tried another pc, however the same problem occured.

# CZECH REPO SAZBA
library(rvest)
library(dplyr)

link <- "https://www.cnb.cz/cs/casto-kladene-dotazy/Jak-se-vyvijela-dvoutydenni-repo-sazba-CNB/"
page <- read_html(link)

date <- page %>% html_nodes('td:nth-child(1)') %>% html_text()
repo <- page %>% html_nodes('td+ td') %>% html_text()

Sazba <- data.frame(cbind(date, repo))
Sazba$repo <- as.numeric(gsub(",", ".", Sazba$repo))
Sazba$date <- gsub(" ", "", Sazba$date)
str(Sazba)
Sazba$date <- as.Date(gsub("[.]", "/", Sazba$date), "%d/%m/%Y")

Picture of the issue in R-Studio

Upvotes: 1

Views: 52

Answers (2)

Mohanasundaram
Mohanasundaram

Reputation: 2949

Try using parsedate package:

library(parsedate)
Sazba$date <- parse_date(Sazba$date)

Upvotes: 0

polkas
polkas

Reputation: 4184

I have a solution, the problem was with different encoding for these date values. I use stringi package to format them so these strange spaces were translated to "\u00a0".

#CZECH REPO SAZBA
library(rvest)
library(dplyr)

link <- "https://www.cnb.cz/cs/casto-kladene-dotazy/Jak-se-vyvijela-dvoutydenni-repo-sazba-CNB/"
page <- read_html(link)

date <- page %>% html_nodes('td:nth-child(1)') %>% html_text()
repo <- page %>% html_nodes('td+ td') %>% html_text()

date <- stringi::stri_escape_unicode(date)

Sazba <- data.frame(cbind(date, repo))
Sazba$repo <- as.numeric(gsub(",", ".", Sazba$repo))
Sazba$date <- gsub("\\\\u00a0", "", Sazba$date)
str(Sazba)
Sazba$date <- as.Date(Sazba$date, "%d.%m.%Y")

Sazba

Upvotes: 2

Related Questions