How to parse Table from Wikipedia using htmltab package?

Question

All, I am trying to parse 1 table located here https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population#Sovereign_states_and_dependencies_by_population. And I would like to use htmltab package to achieve this task. Currently my code looks like following. However I am getting below Error. I tried passing "Rank", "% of world population " in which function, but still received an error. I am not sure, what could be wrong ?

Please Note: I am new to R and Webscraping, if you could provide explanation of the code, that will be great help.

url3 <- "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population#Sovereign_states_and_dependencies_by_population"
list_of_countries<- htmltab(doc = url3, which = "//th[text() = 'Country(or dependent territory)']/ancestor::table")

Error: Couldn't find the table. Try passing (a different) information to the which argument.

mathematical.coffee · Accepted Answer

This is an XPath problem not an R problem. If you inspect the HTML of that table the relevant header is


  Country
(or dependent territory)

So text() on this is just "Country".

For example this could work (this is not the only option, you will just have to try out various xpath selectors to see).

htmltab(doc = url3, which = "//th[text() = 'Country']/ancestor::table")

Alternatively it's the first table on the page, so you could try which=1 instead.

(NB in Chrome you can do $x("//th[text() = 'Country']") and so on in the developer console to try these things out, and no doubt in other browsers also)

How to parse Table from Wikipedia using htmltab package?

Answers (1)

Related Questions