Reputation: 1163
I am trying to get data from a website (https://armstrade.sipri.org/armstrade/page/values.php) which requires submitting a form. There are some radio buttons and drop down boxes where you can select a time period (years) and countries and a download method. I am aware the the data can be downloaded manually, but I would like to programatically download the import data for all countries between 1990 and 2000.
I have tried two different approaches based on answers on SO (see below for code), but am having trouble getting it to actually produce results. Ideally, I would like a dataframe similar to one in the downloaded excel file. Any help or guidance would be greatly appreciated.
Thankyou in advance.
Th first approach is based on Python code for the same site: Scrape a php webpage that needs a submitted form
library(httr)
library(rvest)
df = httr::POST("https://armstrade.sipri.org/armstrade/html/export_values.php",
encode = "form",
body = list('import_or_export' = 'export',
'country_code'= 'All',
'from' = 1990,
'to' = 2000,
'summarize' = 'country',
'filetype'= 'excel',
'Action' ='Download'),
verbose())
The second approach I've tried is relatively similar to this approach, How to retrieve response by using POST in R
headers = c('Content-Type' = 'application/json; charset=UTF-8')
data = "{'country_code':'All','low_year':'1990','high_year':'2000','import_or_export':'import','summarize':'country','filetype':'html','Action':'Download'}"
r <- httr::POST(url = "https://armstrade.sipri.org/armstrade/html/export_values.php",
httr::add_headers(.headers=headers), body = data)
Upvotes: 0
Views: 145
Reputation: 6583
I leave the parsing and cleaning to you, but here's a suggestion for the request
library(tidyverse)
library(httr2)
library(rvest)
"https://armstrade.sipri.org/armstrade/html/export_values.php" %>%
request() %>%
req_body_form(
'import_or_export' = 'export',
'country_code'= '',
'low_year' = 1990,
'high_year' = 2000,
'summarize' = 'country',
'filetype'= 'html',
'Action' = 'Download'
) %>%
req_perform() %>%
resp_body_html() %>%
html_table %>%
getElement(2) %>%
slice(11:nrow(.))
# A tibble: 89 x 14
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1   1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Total NA
2 Angola     8                 8 NA
3 Argentina 6 0   13 5 5         2 31 NA
4 Aruba             18         18 NA
5 Australia 168 90   30 36 36 16 20 4     400 NA
6 Austria 30 20 20 10 17   18 1 29 23 24 191 NA
7 Belarus       8   7 129 398 63 452 293 1349 NA
8 Belgium 1 1     33 158 57 93 46 45 26 458 NA
9 Brazil 106 127 98 40 54 38 27 27 18     535 NA
10 Bulgaria 6 42 16 28 55 1 21 6 39 167 2 381 NA
# ... with 79 more rows
Upvotes: 1