Funkeh-Monkeh
Funkeh-Monkeh

Reputation: 661

Scraping data from API using R

I am quite new to scraping data via API's using R.

I tried to connect to The Malaysian Meteorological Department API using R and I got the curl script below from the references provided.

curl -H "Authorization: METToken MYTOKENID" "http://api.met.gov.my/v2/data?datasetid=FORECAST&datacategoryid=GENERAL&locationid=LOCATION:237&start_date=2017-08-13&end_date=2017-08-13"

How do I use R to pull some data using that? I have an API Token provided to me when I registered.

Thanks!

Upvotes: 0

Views: 167

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78832

If you're using an API, it's not webscraping.

Before doing anything else, first, store your token in MYWX_TOKEN in ~/.Renviron and restart R/RStudio.

Then, do:

devtools::install_github("hrbrmstr/mywx")

Which will enable you to do the following:

library(mywx)
library(tidyverse)

mywx_districts()
## # A tibble: 51 x 6
##             id        name locationcategoryid locationrootid latitude longitude
##  *       <chr>       <chr>              <chr>          <chr>    <dbl>     <dbl>
##  1 LOCATION:17  BATU PAHAT           DISTRICT     LOCATION:1 1.854800  102.9325
##  2 LOCATION:18 JOHOR BAHRU           DISTRICT     LOCATION:1 1.465500  103.7578
##  3 LOCATION:19      KLUANG           DISTRICT     LOCATION:1 2.025100  103.3328
##  4 LOCATION:20 KOTA TINGGI           DISTRICT     LOCATION:1 1.738100  103.8999
##  5 LOCATION:21      LEDANG           DISTRICT     LOCATION:1 2.262401  102.6498
##  6 LOCATION:22     MERSING           DISTRICT     LOCATION:1 2.431200  103.8405
##  7 LOCATION:23        MUAR           DISTRICT     LOCATION:1 2.044200  102.5689
##  8 LOCATION:24    NUSAJAYA           DISTRICT     LOCATION:1 1.413590  103.6317
##  9 LOCATION:25     SEGAMAT           DISTRICT     LOCATION:1 2.514800  102.8158
## 10 LOCATION:26     PONTIAN           DISTRICT     LOCATION:1 1.516380  103.3839
## # ... with 41 more rows

mywx_states()
## # A tibble: 16 x 5
##             id            name locationcategoryid latitude longitude
##  *       <chr>           <chr>              <chr>    <dbl>     <dbl>
##  1  LOCATION:1           JOHOR              STATE 1.465500  103.7578
##  2  LOCATION:2           KEDAH              STATE 6.121040  100.3601
##  3  LOCATION:3        KELANTAN              STATE 6.056660  102.2645
##  4  LOCATION:4    KUALA LUMPUR              STATE 3.143000  101.6948
##  5  LOCATION:5          LABUAN              STATE 4.890934  114.9428
##  6  LOCATION:6          MELAKA              STATE 2.231926  102.2943
##  7  LOCATION:7 NEGERI SEMBILAN              STATE 2.729700  101.9381
##  8  LOCATION:8          PAHANG              STATE 3.807700  103.3260
##  9  LOCATION:9    PULAU PINANG              STATE 5.411230  100.3354
## 10 LOCATION:10           PERAK              STATE 4.584100  101.0829
## 11 LOCATION:11          PERLIS              STATE 6.441400  100.1986
## 12 LOCATION:12       PUTRAJAYA              STATE 2.916670  101.7000
## 13 LOCATION:13           SABAH              STATE 5.974900  116.0724
## 14 LOCATION:14         SARAWAK              STATE 1.583330  110.3333
## 15 LOCATION:15        SELANGOR              STATE 3.085070  101.5328
## 16 LOCATION:16      TERENGGANU              STATE 5.330200  103.1408

mywx_towns()
## # A tibble: 51 x 6
##              id        name locationcategoryid locationrootid latitude longitude
##  *        <chr>       <chr>              <chr>          <chr>    <dbl>     <dbl>
##  1 LOCATION:122  AYER HITAM               TOWN     LOCATION:1   1.9150  103.1808
##  2 LOCATION:123  BATU PAHAT               TOWN     LOCATION:1   1.8548  102.9325
##  3 LOCATION:124 JOHOR BAHRU               TOWN     LOCATION:1   1.4655  103.7578
##  4 LOCATION:125       LABIS               TOWN     LOCATION:1   2.3850  103.0210
##  5 LOCATION:126     TANGKAK               TOWN     LOCATION:1   2.2673  102.5453
##  6 LOCATION:127        MUAR               TOWN     LOCATION:1   2.0442  102.5689
##  7 LOCATION:128       PAGOH               TOWN     LOCATION:1   2.1495  102.7704
##  8 LOCATION:129      KLUANG               TOWN     LOCATION:1   2.0251  103.3328
##  9 LOCATION:130 KOTA TINGGI               TOWN     LOCATION:1   1.7381  103.8999
## 10 LOCATION:131     MERSING               TOWN     LOCATION:1   2.4312  103.8405
## # ... with 41 more rows

mywx_touristdests()
## # A tibble: 30 x 6
##              id              name locationcategoryid locationrootid latitude longitude
##  *        <chr>             <chr>              <chr>          <chr>    <dbl>     <dbl>
##  1 LOCATION:310     BATU FERINGGI        TOURISTDEST     LOCATION:9  5.47090  100.2453
##  2 LOCATION:311     BUKIT BENDERA        TOURISTDEST     LOCATION:9  2.37330  102.5104
##  3 LOCATION:312      BUKIT TINGGI        TOURISTDEST     LOCATION:8  2.28720  103.6726
##  4 LOCATION:313      BUKIT FRASER        TOURISTDEST     LOCATION:8  3.71260  101.7412
##  5 LOCATION:314 CAMERON HIGHLANDS        TOURISTDEST     LOCATION:8  4.48333  101.4500
##  6 LOCATION:315         CHERATING        TOURISTDEST     LOCATION:8  4.12557  103.3939
##  7 LOCATION:316            DESARU        TOURISTDEST     LOCATION:1  1.54020  104.2680
##  8 LOCATION:317 GENTING HIGHLANDS        TOURISTDEST     LOCATION:8  3.39545  101.7792
##  9 LOCATION:318             KIJAL        TOURISTDEST    LOCATION:16  4.35000  103.4833
## 10 LOCATION:319             LUMUT        TOURISTDEST    LOCATION:10  4.23230  100.6298
## # ... with 20 more rows

You can use those as lookup tables to then use the id for obtaining forecast data:

glimpse(mywx_forecast("LOCATION:237", "2017-08-13", "2017-08-13"))
## Observations: 6
## Variables: 12
## $ locationid       <chr> "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:...
## $ locationname     <chr> "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA"
## $ locationrootid   <chr> "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12"
## $ locationrootname <chr> "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA"
## $ date             <dttm> 2017-08-12 16:00:00, 2017-08-12 16:00:00, 2017-08-12 16:00:00, 2017-08-12 16:00:00, 2017-...
## $ datatype         <chr> "FGM", "FGA", "FGN", "FMINT", "FMAXT", "FSIGW"
## $ value            <chr> "Cloudy", "Thunderstorms", "Cloudy", "26", "33", "Thunderstorms"
## $ latitude         <dbl> 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667
## $ longitude        <dbl> 101.7, 101.7, 101.7, 101.7, 101.7, 101.7
## $ attributes.unit  <chr> NA, NA, NA, "Celcius", "Celcius", NA
## $ attributes.code  <chr> NA, NA, NA, NA, NA, "tstorm"
## $ attributes.when  <chr> NA, NA, NA, NA, NA, "Afternoon"

Even for a range:

vals <- mywx_forecast("LOCATION:237", "2017-08-01", "2017-08-13")

glimpse(vals)
## Observations: 51
## Variables: 12
## $ locationid       <chr> "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:237", "LOCATION:...
## $ locationname     <chr> "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA",...
## $ locationrootid   <chr> "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", "LOCATION:12", ...
## $ locationrootname <chr> "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA", "PUTRAJAYA",...
## $ date             <dttm> 2017-08-06 16:00:00, 2017-08-06 16:00:00, 2017-08-06 16:00:00, 2017-08-06 16:00:00, 2017-...
## $ datatype         <chr> "FGM", "FGA", "FGN", "FMINT", "FMAXT", "FSIGW", "FGM", "FGA", "FGN", "FMINT", "FMAXT", "FS...
## $ value            <chr> "No rain", "Rain", "No rain", "25", "33", "Rain", "Rain", "Rain", "No rain", "24", "34", "...
## $ latitude         <dbl> 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, 2.91667, ...
## $ longitude        <dbl> 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7, 101.7,...
## $ attributes.unit  <chr> NA, NA, NA, "Celcius", "Celcius", NA, NA, NA, NA, "Celcius", "Celcius", NA, NA, NA, NA, "C...
## $ attributes.code  <chr> NA, NA, NA, NA, NA, "rain", NA, NA, NA, NA, NA, "rain", NA, NA, NA, NA, NA, "sunny", NA, N...
## $ attributes.when  <chr> NA, NA, NA, NA, NA, "Afternoon", NA, NA, NA, NA, NA, "Morning and Afternoon", NA, NA, NA, ...

The return format is not really optimal for "data science" ops, but you can whack it to work:

dplyr::filter(vals, datatype %in% c("FMINT", "FMAXT")) %>% 
  mutate(value = as.numeric(value)) %>% 
  ggplot(aes(date, value, color = datatype)) +
  geom_line()

enter image description here

Upvotes: 3

Related Questions