Reputation: 85
I am trying to get at html table out of this page but i tried different approach and they all fail (look like the document is wrongly formed.
I tried this way:
library(XML)
x = readHTMLTable("https://www.jpmorganchasecc.com/results/search.php?city_id=16&search=1&gender=m&year=2015")
I got the error
XML does not seem to be XML
Then i tried this way:
library(RCurl)
fileURL <- "(same link than before)"
xData <- getURL(fileURL)
doc <- xmlParse(xData)
and i got
Failed to parse xmlns
So i was wondering if i should tried to find a way (perhaps regex?) to gather only the table code then parse it?
Upvotes: 1
Views: 73
Reputation: 78842
If you use rvest
then you just need to target the proper table:
library(rvest)
URL <- "https://www.jpmorganchasecc.com/results/search.php?city_id=16&search=1&gender=m&year=2015"
pg <- read_html(URL)
dat <- html_table(html_nodes(pg, "table#results"))[[1]]
Upvotes: 1
Reputation: 683
Try this:
library(XML)
library(RCurl)
url <- "https://www.jpmorganchasecc.com/results/search.php?city_id=16&search=1&gender=m&year=2015"
tables <- getURL(url)
tables <- readHTMLTable(tables, stringsAsFactors = F)
#Shows you all the tables pulled
str(tables)
#To view a particular table
View(tables$results)
Upvotes: 3