Reputation: 1468
I want to scrape the website: link
I use GET
from httr
, and get the json lite object, but without quotes, like below:
"hxbase_json1({sum:3003,list:[{Number:'1'...
So jsonlite::fromJSON
cannot read this json..
My code is
url <- 'http://stockdata.stock.hexun.com/zrbg/data/zrbList.aspx?'
date <- '2015-12-31'
page <- 1
res <- GET(url, query = list(date = date,
count = 20,
pname = 20,
titType = 'null',
page = page
))
resC <- content(res)
resC1 <- jsonlite::fromJSON(resC)
I was wondering is there any package adding quotes to json automatically? Or is there anyway to read such json ?
Upvotes: 3
Views: 266
Reputation: 78832
In the future, please post your R code and the correct URL(s). It's technically not JSON data, it's a JavaScript construct (they aren't the same). You can do a bit of surgery and enlist the help of the V8 package:
library(httr)
library(V8)
library(stringi)
res <- GET("http://stockdata.stock.hexun.com/zrbg/data/zrbList.aspx?date=2015-12-31&count=20&pname=20&titType=null&page=1&callback=hxbase_json11479871629254")
ctx <- v8()
content(res) %>%
stri_replace_first_fixed("hxbase_json1(", "var dat=") %>%
stri_replace_last_fixed(")", "") %>%
ctx$eval()
ctx$get("dat") %>%
dplyr::glimpse()
## List of 2
## $ sum : int 3003
## $ list:'data.frame': 20 obs. of 13 variables:
## ..$ Number : chr [1:20] "1" "2" "3" "4" ...
## ..$ StockNameLink: chr [1:20] "stock_bg.aspx?code=000002&date=2015-12-31" "stock_bg.aspx?code=601601&date=2015-12-31" "stock_bg.aspx?code=000550&date=2015-12-31" "stock_bg.aspx?code=000001&date=2015-12-31" ...
## ..$ industry : chr [1:20] "万科A(000002)" "中国太保(601601)" "江铃汽车(000550)" "平安银行(000001)" ...
## ..$ stockNumber : chr [1:20] "24.36" "24.07" "23.01" "18.69" ...
## ..$ industryrate : chr [1:20] "90.27" "86.41" "84.29" "84.14" ...
## ..$ Pricelimit : chr [1:20] "A" "A" "A" "A" ...
## ..$ lootingchips : chr [1:20] "15.00" "15.00" "9.03" "15.00" ...
## ..$ Scramble : chr [1:20] "15.00" "12.00" "20.00" "15.00" ...
## ..$ rscramble : chr [1:20] "8.00" "6.00" "18.00" "8.00" ...
## ..$ Strongstock : chr [1:20] "27.91" "29.34" "14.25" "27.45" ...
## ..$ Hstock : chr [1:20] " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-14/1202040307.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-28/1202085787.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-19/1202057166.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ " <a href =\"http://www.cninfo.com.cn/finalpage/2016-03-10/1202033377.PDF\" target=\"_blank\"><img alt=\"\" src=\"img/table_btn1"| __truncated__ ...
## ..$ Wstock : chr [1:20] "<a href =\"http://stockdata.stock.hexun.com/000002.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/601601.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/000550.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" "<a href =\"http://stockdata.stock.hexun.com/000001.shtml\" target=\"_blank\"><img alt=\"\" src=\"img/icon_02.gif\"></img ></a>" ...
## ..$ Tstock : chr [1:20] "<img alt=\"\" onclick=\"addIStock('000002','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('601601','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('000550','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" "<img alt=\"\" onclick=\"addIStock('000001','1');\" code=\"\" codetype=\"\" \" src=\"img/icon_03.gif\"></img >" ...
Upvotes: 4