h.l.m
h.l.m

Reputation: 13485

correct parameters to download file using Amazon s3 API GET requests

I would like to be able to download a .csv file from my Amazon S3 bucket using R.

I have started using the API that is documented here http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectGET.html

I am using the package httr to create the GET request, I just need to work out what the correct parameters are to be able to download the relevant file.

I have set the response-content-type to text/csv as I know its a .csv file I hope to download...but the response I get is as follows:

Response [https://s3-zone.amazonaws.com/bucket.name/file.name.csv?response-content-type=text%2Fcsv]
  Status: 200
  Content-type: text/csv
Date and Time,Open,High,Low,Close,Volume
2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
2007/01/01 22:52:00,5675.00,5676.00,5674.00,5674.00,17
2007/01/01 22:53:00,5674.00,5674.00,5673.00,5674.00,42
2007/01/01 22:54:00,5675.00,5676.00,5674.00,5676.00,36
2007/01/01 22:55:00,5675.00,5676.00,5675.00,5676.00,18
2007/01/01 22:56:00,5676.00,5677.00,5674.00,5677.00,64
2007/01/01 22:57:00,5678.00,5678.00,5677.00,5677.00,45
2007/01/01 22:58:00,5679.00,5680.00,5678.00,5680.00,30
 .../01/01 22:59:00,5679.00,5679.00,5677.00,5678.00,19

And no file is downloaded and the data seems to be in the response...I can extract the string of characters that is created in the response, which represents the data, and I guess with some effort it can be converted into a data.frame as originally desired, but is there a better way of downloading the data...straight from the GET command, and then using read.csv to read the data? I think that it is a parameter issues...just not sure what parameters need to be set for the file to be downloaded.

If people suggest the conversion of the string...This is the structure of the string I have...what commands would I need to do to convert it into a data.frame?

chr "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n2007/01/01 22:52:00,5675."| __truncated__

Thanks

HLM

Upvotes: 3

Views: 3094

Answers (2)

IRTFM
IRTFM

Reputation: 263451

The answer to your second question:

> chr <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
> read.csv(text=chr)
        Date.and.Time Open High  Low Close Volume
1 2007/01/01 22:51:00 5683 5683 5673  5673     64

If you want extra speed for the read.csv, try this:

chr <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
 read.csv(text=chr, colClasses=c("POSIXct", rep("numeric", 5) ) )

Assuming the URL is set up properly (and we have nothing to test this on yet) I'm wondering if you may want to look at the value for GET( ...)$content

Perhaps:

infile <- read.csv(text=GET(...)$content, colClasses=c("POSIXct", rep("numeric", 5) ) )

Edit:

That was not correct because the data comes across as "raw" format. One needs to convert from raw before it will become encoded as text. I did a quick search of Nabble (it must be good for something after all) to find a csv file that was residing on the Web. This is what finally worked:

read.csv(text=rawToChar( 
                 GET(
                  "http://nseindia.com/content/equities/scripvol/datafiles/16-11-2012-TO-16-11-2012ACCEQN.csv"
                   )[["content"]] ) )
  Symbol Series        Date Prev.Close Open.Price High.Price Low.Price Last.Price Close.Price
1    ACC     EQ 16-Nov-2012     1404.4    1410.95    1410.95   1369.45    1374.95      1378.1
  Average.Price Total.Traded.Quantity Turnover.in.Lacs Deliverable.Qty X..Dly.Qt.to.Traded.Qty
1       1393.62                132921          1852.41           56899                   42.81

Upvotes: 3

Ari B. Friedman
Ari B. Friedman

Reputation: 72769

Here's one way:

library(taRifx) # for stack.list
test <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
stack( sapply( strsplit( test, "\\n" )[[1]], strsplit, split="," ) )

    [,1]                  [,2]      [,3]      [,4]      [,5]      [,6]      
ret "Date and Time"       "Open"    "High"    "Low"     "Close"   "Volume\r"
new "2007/01/01 22:51:00" "5683.00" "5683.00" "5673.00" "5673.00" "64\r"    
new "2007/01/01 22:51:00" "5683.00" "5683.00" "5673.00" "5673.00" "64\r"    

Now convert to a data.frame:

testdat <- stack( sapply( strsplit( test, "\\n" )[[1]], strsplit, split="," ) )
rownames(testdat) <- seq(nrow(testdat)) # Because duplicate rownames aren't allowed in data.frames
colnames(testdat) <- testdat[1,]
testdat <- testdat[-1,]
as.data.frame(testdat)
        Date and Time    Open    High     Low   Close Volume\r
2 2007/01/01 22:51:00 5683.00 5683.00 5673.00 5673.00     64\r
3 2007/01/01 22:51:00 5683.00 5683.00 5673.00 5673.00     64\r

Upvotes: 2

Related Questions