Reputation: 785
All, I am trying to parse 1 table located here https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population#Sovereign_states_and_dependencies_by_population. And I would like to use htmltab package to achieve this task. Currently my code looks like following. However I am getting below Error. I tried passing "Rank", "% of world population " in which function, but still received an error. I am not sure, what could be wrong ?
Please Note: I am new to R and Webscraping, if you could provide explanation of the code, that will be great help.
url3 <- "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population#Sovereign_states_and_dependencies_by_population"
list_of_countries<- htmltab(doc = url3, which = "//th[text() = 'Country(or dependent territory)']/ancestor::table")
Error: Couldn't find the table. Try passing (a different) information to the which argument.
Upvotes: 1
Views: 265
Reputation: 56915
This is an XPath problem not an R problem. If you inspect the HTML of that table the relevant header is
<th class="headerSort" tabindex="0" role="columnheader button" title="Sort ascending">
Country<br><small>(or dependent territory)</small>
</th>
So text()
on this is just "Country".
For example this could work (this is not the only option, you will just have to try out various xpath selectors to see).
htmltab(doc = url3, which = "//th[text() = 'Country']/ancestor::table")
Alternatively it's the first table on the page, so you could try which=1
instead.
(NB in Chrome you can do $x("//th[text() = 'Country']")
and so on in the developer console to try these things out, and no doubt in other browsers also)
Upvotes: 1