Reputation: 13485
I would like to be able to download a .csv
file from my Amazon S3 bucket using R.
I have started using the API that is documented here http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectGET.html
I am using the package httr
to create the GET
request, I just need to work out what the correct parameters are to be able to download the relevant file.
I have set the response-content-type
to text/csv
as I know its a .csv
file I hope to download...but the response I get is as follows:
Response [https://s3-zone.amazonaws.com/bucket.name/file.name.csv?response-content-type=text%2Fcsv]
Status: 200
Content-type: text/csv
Date and Time,Open,High,Low,Close,Volume
2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
2007/01/01 22:52:00,5675.00,5676.00,5674.00,5674.00,17
2007/01/01 22:53:00,5674.00,5674.00,5673.00,5674.00,42
2007/01/01 22:54:00,5675.00,5676.00,5674.00,5676.00,36
2007/01/01 22:55:00,5675.00,5676.00,5675.00,5676.00,18
2007/01/01 22:56:00,5676.00,5677.00,5674.00,5677.00,64
2007/01/01 22:57:00,5678.00,5678.00,5677.00,5677.00,45
2007/01/01 22:58:00,5679.00,5680.00,5678.00,5680.00,30
.../01/01 22:59:00,5679.00,5679.00,5677.00,5678.00,19
And no file is downloaded and the data seems to be in the response...I can extract the string of characters that is created in the response, which represents the data, and I guess with some effort it can be converted into a data.frame
as originally desired, but is there a better way of downloading the data...straight from the GET
command, and then using read.csv
to read the data? I think that it is a parameter issues...just not sure what parameters need to be set for the file to be downloaded.
If people suggest the conversion of the string...This is the structure of the string I have...what commands would I need to do to convert it into a data.frame
?
chr "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n2007/01/01 22:52:00,5675."| __truncated__
Thanks
HLM
Upvotes: 3
Views: 3094
Reputation: 263451
The answer to your second question:
> chr <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
> read.csv(text=chr)
Date.and.Time Open High Low Close Volume
1 2007/01/01 22:51:00 5683 5683 5673 5673 64
If you want extra speed for the read.csv, try this:
chr <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
read.csv(text=chr, colClasses=c("POSIXct", rep("numeric", 5) ) )
Assuming the URL is set up properly (and we have nothing to test this on yet) I'm wondering if you may want to look at the value for GET( ...)$content
Perhaps:
infile <- read.csv(text=GET(...)$content, colClasses=c("POSIXct", rep("numeric", 5) ) )
That was not correct because the data comes across as "raw" format. One needs to convert from raw before it will become encoded as text. I did a quick search of Nabble (it must be good for something after all) to find a csv file that was residing on the Web. This is what finally worked:
read.csv(text=rawToChar(
GET(
"http://nseindia.com/content/equities/scripvol/datafiles/16-11-2012-TO-16-11-2012ACCEQN.csv"
)[["content"]] ) )
Symbol Series Date Prev.Close Open.Price High.Price Low.Price Last.Price Close.Price
1 ACC EQ 16-Nov-2012 1404.4 1410.95 1410.95 1369.45 1374.95 1378.1
Average.Price Total.Traded.Quantity Turnover.in.Lacs Deliverable.Qty X..Dly.Qt.to.Traded.Qty
1 1393.62 132921 1852.41 56899 42.81
Upvotes: 3
Reputation: 72769
Here's one way:
library(taRifx) # for stack.list
test <- "Date and Time,Open,High,Low,Close,Volume\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64\r\n"
stack( sapply( strsplit( test, "\\n" )[[1]], strsplit, split="," ) )
[,1] [,2] [,3] [,4] [,5] [,6]
ret "Date and Time" "Open" "High" "Low" "Close" "Volume\r"
new "2007/01/01 22:51:00" "5683.00" "5683.00" "5673.00" "5673.00" "64\r"
new "2007/01/01 22:51:00" "5683.00" "5683.00" "5673.00" "5673.00" "64\r"
Now convert to a data.frame:
testdat <- stack( sapply( strsplit( test, "\\n" )[[1]], strsplit, split="," ) )
rownames(testdat) <- seq(nrow(testdat)) # Because duplicate rownames aren't allowed in data.frames
colnames(testdat) <- testdat[1,]
testdat <- testdat[-1,]
as.data.frame(testdat)
Date and Time Open High Low Close Volume\r
2 2007/01/01 22:51:00 5683.00 5683.00 5673.00 5673.00 64\r
3 2007/01/01 22:51:00 5683.00 5683.00 5673.00 5673.00 64\r
Upvotes: 2