zutt
zutt

Reputation: 25

Issue webscraping a website: not extracting anything

I am trying to extract data from the following website: 'https://2010-2014.kormany.hu/hu/hirek'. When I try to extract, for example, the articles' links from that website using the following, I got nothing.

library(rvest)
library(dplyr)
library(XML)

url <- 'www.2015-2019.kormany.hu/hu/hirek'
links <- read_html(url) %>% html_nodes("div") %>% html_nodes(xpath = '//*[@class="article"]') %>% html_nodes("h2") %>% html_nodes("a") %>% html_attr("href")

links
> character(0)

I don't even get anything if I run the following code:

links <- read_html(url) %>% html_nodes("div")

links
> character(0)

This is very strange since, when I inspect the website, it seems that I should be getting the list of URLs from the code I provided. According to the website's source, there are "div" nodes ('view-source:https://2015-2019.kormany.hu/hu/hirek'). Does anyone know what I could be doing wrong?

Upvotes: 0

Views: 76

Answers (1)

zutt
zutt

Reputation: 25

Today I re-tried my code and it works perfectly. I am not sure what was happening yesterday.

Upvotes: 0

Related Questions