Rollo99
Rollo99

Reputation: 1613

How to deal with this website in a webscraping format?

I am trying to webscrape this website.

I am applying the same code that I always use to webscrape pages:

url_dv1 <- "https://ec.europa.eu/commission/presscorner/detail/en/qanda_20_171?fbclid=IwAR2GqXLmkKRkWPoy3-QDwH9DzJiexFJ4Sp2ZoWGbfmOR1Yv8POdlLukLRaU"

url_dv1 <- paste(html_text(html_nodes(read_html(url_dv1), "#inline-nav-1 .ecl-paragraph")), collapse = "")

For this website, thought, the code doesn't seem to be working. In fact, I get Error in UseMethod("read_xml") : no applicable method for 'read_xml' applied to an object of class "c('xml_document', 'xml_node')".

Why is it so? How can I fix it?

Thanks a lot!

Upvotes: 1

Views: 48

Answers (1)

Jakub.Novotny
Jakub.Novotny

Reputation: 3047

The problem is that the web page is dynamically rendered. You can overcome this using phantomjs (can be downloaded here https://phantomjs.org/download.html). You will also need a custom javascript script (see below). The below R code works for me.

library(tidyverse)
library(rvest)

dir_js <- "path/to/a/directory" # JS code needs to be inserted here, the name of the file needs to be javascript.js
url <- "https://ec.europa.eu/commission/presscorner/detail/en/qanda_20_171?fbclid=IwAR2GqXLmkKRkWPoy3-QDwH9DzJiexFJ4Sp2ZoWGbfmOR1Yv8POdlLukLRaU"

system2("path/to/where/you/have/phantomjs.exe", # directory to phantomJS
        args = c(file.path(dir_js, "javascript.js"), url))

read_html("myhtml.html") %>%
  html_nodes("#inline-nav-1 .ecl-paragraph") %>%
  html_text()


# this is the javascript code to be saved in javascript directory as javascript.js
// create a webpage object
var page = require('webpage').create(),
system = require('system')

// the url for each country provided as an argument
country= system.args[1];

// include the File System module for writing to files
var fs = require('fs');

// specify source and path to output file
// we'll just overwirte iteratively to a page in the same directory
var path = 'myhtml.html'

Upvotes: 1

Related Questions