Mark
Mark

Reputation: 171

Extract multiple json documents from aws s3 using R

I am currently experimenting with extracting documents from aws S3 and R. I have successfully managed to extract 1 document and create a dataframe with that document. I would like to be able to extract multiple documents which are within multiple sub folders of eventstore/footballStats/.

CODE demonstrates 1 document being pulled which works.

install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat")) # runs an update for aws S3
library(aws.s3)

# Set credentials for S3 ####
Sys.setenv("AWS_ACCESS_KEY_ID" = "KEY","AWS_SECRET_ACCESS_KEY" = "AccessKey")  


# Extracts 1 document raw vector representation of an S3 documents ####
DataVector <-get_object("s3://eventstore/footballStats/2017-04-22/13/01/doc1.json") 

I have then tried code below to pull all documents from the folder and subfolders but receive an error.

DataVector <-get_object("s3://eventstore/footballStats/2017-04-22/*") 

 ERROR : 

chr "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error>
<Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><K"| __truncated__

Is there an alternative r package I should be using? or Is the function get_object() only work for 1 document and I should be using another function from aws.s3 library?

Upvotes: 2

Views: 2001

Answers (1)

Mark
Mark

Reputation: 171

Based on the hints from Drj and Thomas I was able to solve this..

### Displays Buckets in s3####
bucketlist()

### Builds a dataframe of the files in a bucket###
dfBucket <- get_bucket_df('eventstore', 'footballStats/2017-04-22/')

# creates path based on data in bucket
path <- dfBucket$Key

### Extracts all data into values ####
s3Data <- NULL
for (lineN in path) {
  url <- paste('s3://eventstore/',lineN, sep= "") 
  s3Vector <- get_object(url)
  s3Value <- rawToChar(s3Vector)
  s3Data <- c(s3Data, s3Value)
}

To create a dataframe from the data use tidyjson and dplyr. See link for well explained document on this.

https://cran.r-project.org/web/packages/tidyjson/vignettes/introduction-to-tidyjson.html

Upvotes: 3

Related Questions