Reputation: 41
I'm working on a project studying state issued municipal bonds but I am having trouble getting my data. Using the XML package and the below code I was able to get some of it.
> nys="http://newyork.municipalbonds.com/bonds/issue/649787N87"
> nys.table=readHTMLTable(nys,asText=TRUE,which=4)
> nys.table=as.data.frame(nys.table)
> head(nys.table)
Trade Date Trade Time Maturity Date Coupon Price Yield Trade Amount Trade Type
1 2012-09-27 2:49pm 2013-Apr 5.000% 102.522 0.289 $270,000 Investor bought
2 2012-09-27 1:17pm 2013-Apr 5.000% 102.290 0.712 $45,000 Inter-dealer
But that site only offers a small sample for free. The official website, EMMA has the data for free but I'm having a terrible time scraping it. When I try the same approach as before I end up with
nys="http://emma.msrb.org/SecurityView/SecurityDetailsTrades.aspx?cusip=649787N87"
nys.table=readHTMLTable(nys,asText=TRUE)
nys.table=as.data.frame(nys.table)
head(nys.table)
data frame with 0 columns and 0 rows
From what I understand, and I'm fairly certain about this, is that there is a standard T&C page when you navigate to it via web browser. After using htmlParse(nys), the output is identical to the page source code of the T&C page and not the page where the data is actually located. So when the code runs, it is trying to find tables on the T&C page.
I figured that this would be a fairly common problem but so far I have not been able to find any posts where someone had a similar issue. If someone could point me in the right direction, I'd be greatly appreciative.
Upvotes: 3
Views: 1394
Reputation: 18323
I finally got it to work. I had to use Web Developer in Firefox which allowed me to see what name/value pair the site was setting for the Disclaimer cookie. Here it is:
library(RCurl)
nys="http://emma.msrb.org/SecurityView/SecurityDetailsTrades.aspx?cusip=649787N87"
txt<-getURLContent(nys,cookie='Disclaimer=Ratings')
readHTMLTable(htmlParse(txt, asText = TRUE))
$ctl00_mainContentArea_tradeSearchResults
Trade Date/Time Settlement Date Price (%) Yield (%) Trade Amt ($) Trade Submission Type
1 09/27/2012 : 02:49 PM 10/02/2012 102.5220 0.289 270,000 Customer bought
2 09/27/2012 : 01:17 PM 10/02/2012 102.29 0.712 45,000 Inter-dealer Trade
3 09/27/2012 : 01:17 PM 10/02/2012 102.29 0.712 45,000 Inter-dealer Trade
To get the next 100 rows, you have to post a form with the current "viewstate":
# Get next set
viewstate=gsub('.*\"__VIEWSTATE\" value=\"([^\"]*)\".*','\\1',txt)
txt<-postForm(nys,
"__VIEWSTATE"=viewstate,
"__EVENTTARGET"="ctl00$mainContentArea$nextBottomButton",
.opts=list(cookie='Disclaimer=Ratings'))
readHTMLTable(htmlParse(txt, asText = TRUE))
$ctl00_mainContentArea_tradeSearchResults
Trade Date/Time Settlement Date Price (%) Yield (%) Trade Amt ($) Trade Submission Type
1 06/27/2011 : 01:51 PM 06/30/2011 107.7350 0.65 600,000 Customer sold
2 06/22/2011 : 12:05 PM 06/27/2011 107.1960 0.957 8,000 Customer bought
3 06/22/2011 : 12:05 PM 06/27/2011 106.6960 1.226 8,000 Inter-dealer Trade
Upvotes: 6