Vincent Diallo-Nort
Vincent Diallo-Nort

Reputation: 85

Guidance for scraping HTML table

I am trying to get at html table out of this page but i tried different approach and they all fail (look like the document is wrongly formed.

I tried this way:

library(XML)
x = readHTMLTable("https://www.jpmorganchasecc.com/results/search.php?city_id=16&search=1&gender=m&year=2015")

I got the error

XML does not seem to be XML

Then i tried this way:

library(RCurl)
fileURL <- "(same link than before)"
xData <- getURL(fileURL)
doc <- xmlParse(xData)

and i got

Failed to parse xmlns

So i was wondering if i should tried to find a way (perhaps regex?) to gather only the table code then parse it?

Upvotes: 1

Views: 73

Answers (2)

hrbrmstr
hrbrmstr

Reputation: 78842

If you use rvest then you just need to target the proper table:

library(rvest)

URL <- "https://www.jpmorganchasecc.com/results/search.php?city_id=16&search=1&gender=m&year=2015"
pg <- read_html(URL)
dat <- html_table(html_nodes(pg, "table#results"))[[1]]

Upvotes: 1

steven
steven

Reputation: 683

Try this:

library(XML)
library(RCurl)

url <- "https://www.jpmorganchasecc.com/results/search.php?city_id=16&search=1&gender=m&year=2015"

tables <- getURL(url)
tables <- readHTMLTable(tables, stringsAsFactors = F)

#Shows you all the tables pulled
str(tables)

#To view a particular table
View(tables$results)

Upvotes: 3

Related Questions