Reputation: 65
I gleaned some information from an HTML table online using the XML package:
library("XML")
library("RCurl")
library("rlist")
theurl = getURL("http://www.victoria2wiki.com/Countries_table", .opts = list(ssl.verifypeer = FALSE))
tables <- readHTMLTable(theurl, as.data.frame = TRUE)
tables
now holds a list
containing information from the table on the page.
Then we convert this list
to a dataframe
by using:
df <- do.call(rbind.data.frame, tables)
names(df)
shows
[1] " Country\n" " Tier\n" " Population\n" " Literacy\n"
df[,3]
shows all of the population numbers. We tried to plot it using:
> plot(df[,3])
, but the graph is incorrect and shows population numbers on X-axis and does not make sense.
How do we plot country names against their population given our simple R data frame? What we want is a simple line plot of populations on Y-axis and names of countries on X-axis.
Upvotes: 1
Views: 32
Reputation: 24252
Here is a possible solution:
library("XML")
library("RCurl")
library("rlist")
theurl = getURL("http://www.victoria2wiki.com/Countries_table", .opts = list(ssl.verifypeer = FALSE))
tables <- readHTMLTable(theurl, as.data.frame = TRUE)
# tables is a list with two elements
# The data frame is stored in the second element of this list
df <- tables[[2]]
colnames(df) <- c("Country", "Tier", "Population", "Literacy")
# Population is a factor and needs to be converted into a numeric vector
par(mar=c(3,7,1,1))
barplot(as.numeric(gsub(",", "", df$Population)),
names.arg=df$Country, horiz=T, las=1, cex.names=0.6)
Upvotes: 2