DirkLX
DirkLX

Reputation: 1337

call REST API per row of data.frame in R

I'm trying to make an API call per row of my data.frame, while passing the value of a row as a parameter. There return value should be added to my frame

However, I'm failing to make my code work. Any help is appreciated.

Contents of APPLYapiCallTest.csv:

id  title   returnValue
01  "wine"  ""
02  "beer"  ""
03  "coffee"    ""

Essentially call API with call("wine) and replace the empty field with the result e.g. beverage.

Here's what I got so far.

#call api per row using apply
library(jsonlite)
library(httr)  


callAPI <- function(x) {

  findWhat <- as.character(x) 

  #create ULR
  url1 <- "http://api.nal.usda.gov/ndb/search/?format=json&q="
  url2 <- "&sort=n&max=1&offset=0&api_key=KYG9lsu0nz31SG5yHGdAsM28IuCEGxWWlvdYqelI&location=Chicago%2BIL"
  fURL <- paste(url1, findWhat, url2, sep="")

  apiRet <- data.frame(fromJSON(txt=fURL, flatten = TRUE))

  result <- apiRet[1,c(9)]
  return(result)
}


tData <- data.frame(read.delim("~/Documents/R-SCRIPTS/DATA/APPLYapiCallTest"))

apply(tData[,c('title')], 1, function(x) callAPI(x) )

Upvotes: 0

Views: 1024

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78832

I think you'd be better off doing something like this:

library(jsonlite)
library(httr)
library(pbapply)

callAPI <- function(x) {

  res <- GET("http://api.nal.usda.gov/ndb/search/",
             query=list(format="json",
                        q=x,
                        sort="n",
                        offset=0,
                        max=16,
                        api_key=Sys.getenv("NAL_API_KEY"),
                        location="Berwick+ME"))

  stop_for_status(res)

  return(data.frame(fromJSON(content(res, as="text"), flatten=TRUE), stringsAsFactors=FALSE))

}


tData <- data.frame(id=c("01", "02", "03"),
                    title=c("wine", "beer", "coffee"),
                    returnValue=c("", "", ""), 
                    stringsAsFactors=FALSE)


dat <- merge(do.call(rbind.data.frame, pblapply(tData$title, callAPI)),
             tData[, c("title", "id")], by.x="list.q", by.y="title", all.x=TRUE)

str(dat)

## 'data.frame': 45 obs. of  12 variables:
##  $ list.q          : chr  "beer" "beer" "beer" "beer" ...
##  $ list.sr         : chr  "28" "28" "28" "28" ...
##  $ list.start      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ list.end        : int  13 13 13 13 13 13 13 13 13 13 ...
##  $ list.total      : int  13 13 13 13 13 13 13 13 13 13 ...
##  $ list.group      : chr  "" "" "" "" ...
##  $ list.sort       : chr  "n" "n" "n" "n" ...
##  $ list.item.offset: int  0 1 2 3 4 5 6 7 8 9 ...
##  $ list.item.group : chr  "Beverages" "Beverages" "Beverages" "Beverages" ...
##  $ list.item.name  : chr  "Alcoholic beverage, beer, light" "Alcoholic beverage, beer, light, BUD LIGHT" "Alcoholic beverage, beer, light, BUDWEISER SELECT" "Alcoholic beverage, beer, light, higher alcohol" ...
##  $ list.item.ndbno : chr  "14006" "14007" "14005" "14248" ...
##  $ id              : chr  "02" "02" "02" "02" ...

This way you're making a structured call via httr::GET for each title in tData and then binding all the results together into a data frame and finally adding back the ID.

This also allows you to put NAL_API_KEY into .Renviron and avoid exposing it in your workflows (and on SO :-)

You can trim out the API result responses you don't need either in the function or outside it.

Upvotes: 3

Related Questions