Reputation: 33
I'm a beginner and I have a problem with scraping.
I need to get data about the active/inactive VEIS number for a few clients. For now, I trying for only one. On the website, I have to: set values and sending the form, after that the browser redirects to the next page, where I can find an interesting date.
Below I sent my code. Maybe someone can help.
library(rvest)
library(XML)
url <- 'http://ec.europa.eu/taxation_customs/vies/vatResponse.html?
locale=pl'
session1 <- html_session(url)
form1 <-html_form(session1)
form1
date <- set_values(form1[[1]], requesterMemberStateCode = "AT-
Austria",requesterNumber = "4324")
date
set <- submit_form(session = session1,form = date)
Upvotes: 2
Views: 1821
Reputation: 13680
First of all you don't need the XML
package, rvest
is enough.
You had the form submitting part almost right, you just put in wrong field names.
library(rvest)
#> Loading required package: xml2
url <- 'http://ec.europa.eu/taxation_customs/vies/vatResponse.html?locale=pl'
session1 <- html_session(url)
form1 <-html_form(session1)
form1[[1]]
#> <form> 'vowRequest' (POST vatResponse.html)
#> <select> 'memberStateCode' [0/29]
#> <input text> '': --
#> <input text> 'number':
#> <input text> 'traderName':
#> <select> 'traderCompanyType' [0/0]
#> <input text> 'traderStreet':
#> <input text> 'traderPostalCode':
#> <input text> 'traderCity':
#> <select> 'requesterMemberStateCode' [0/30]
#> <input text> '':
#> <input text> 'requesterNumber':
#> <input hidden> 'action': check
#> <input submit> 'check': Weryfikuj
date <- set_values(form1[[1]], memberStateCode = "AT", number = "4324")
set <- submit_form(session = session1,form = date)
#> Submitting with 'NULL'
After that, extracting the values you are interested in it's easy:
set %>%
read_html() %>%
html_table(fill = TRUE) %>%
purrr::pluck(1) %>%
dplyr::slice(4:n()) %>%
dplyr::select(1:2)
#> # A tibble: 6 x 2
#> X1 X2
#> <chr> <chr>
#> 1 Państwo Członkowskie AT
#> 2 Numer VAT AT 4324
#> 3 Data zapytania 2018/05/17 14:33:10
#> 4 Nazwa ---
#> 5 Adres ---
#> 6 Identyfikator zapytania ""
Created on 2018-05-17 by the reprex package (v0.2.0).
Upvotes: 1