Reputation: 1245
I scraped this data from the OCC website and got returned an ascii file that is space delimited. I am looking to turn this string into a data frame.
I have tried using read.table, readr::read_tsv, but I am not getting the results desired. Below is the code to get acess to the data I am looking to convert.
library(rvest)
library(readr)
data = read_html('https://www.theocc.com/webapps/series-search?
symbolType=U&symbol=AAPL')%>%html_text()
x = read.table(data, header = T)
x = read_tsv(data)
I would have expected t osee the result come out as a data frame BUT instead read.table() prints the result to the console with a error and warning message.
Upvotes: 2
Views: 552
Reputation: 15065
The downloaded file contains descriptive content above the header; actually 6 lines:
Series Search Results for AAPL Products for this underlying symbol are traded on: AMEX ARCA BATS BOX C2 CBOE EDGX GEM ISE MCRY MIAX MPRL NOBO NSDQ PHLX Series/contract Strike Open Interest ProductSymbol year Month Day Integer Dec C/P Call Put Position Limit AAPL 2019 01 25 100 000 C P 0 190 25000000 AAPL 2019 01 25 105 000 C P 0 127 25000000 AAPL 2019 01 25 110 000 C P 0 87 25000000 AAPL 2019 01 25 115 000 C P 0 314 25000000 ...
You can read it via read_tsv(skip = 6)
:
library(rvest)
library(readr)
df <- read_html(
'https://www.theocc.com/webapps/series-search?symbolType=U&symbol=AAPL'
) %>%
html_text() %>%
read_tsv(
skip = 6
)
However, the first column has a wide header and there's multiple (2) TABs separating it from the next column, resulting in
You'll have to do some massaging:
dfnames <- names(df)[1:10]
df <- df %>%
select(-year)
names(df) <- dfnames
Upvotes: 2