Converting XML to dataframe in R studio

Question

I am not a coder, but trying to learn R. I have scraped these jobs from indeed and need in dataframe for analysis. My file is here

However I used this code:

install.packages("XML")
library("XML")
library("methods")
results <- xmlParse("http://api.indeed.com/ads/apisearch?publisher=8693092939388569&q=data+scientist&sort=&radius=&st=&jt=&start=&limit=2000&fromage=&filter=&latlong=1&co=in&chnl=&userip=1.2.3.4&useragent=Mozilla/%2F4.0%28Firefox%29&v=2", isURL=TRUE)
print(results)
rootnode <- xmlRoot(results)
rootsize <- xmlSize(rootnode)
print(rootsize)
> print(rootsize)
[1] 10

My problem starts in the following code(I think, the argument is not doing well):

xmldataframe <- xmlToDataFrame("http://api.indeed.com/ads/apisearch?publisher=8693092939388569&q=data+scientist&sort=&radius=&st=&jt=&start=&limit=2000&fromage=&filter=&latlong=1&co=in&chnl=&userip=1.2.3.4&useragent=Mozilla/%2F4.0%28Firefox%29&v=2")
print(xmldataframe)

Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("Indian Council Of Medical Research (ICMR) Needs ScientistIndian Council of Medical Research (ICMR)INIndiaEmployment SamacharThu, 18 Aug 2016 16:16:15 GMTIndian Council Of Medical Research (ICMR) Needs Scientist. Indian Council of Medical Research (ICMR) invites applications to recruit on vacant posts of...http://www.indeed.co.in/viewjob?jk=20d1db3c7d973199&qd=704PFtVAS6xUi0-OukCaEmfxgGzxqabhMKv0iphFlwZvghJwQWAysomG7BsaL67IpeRHLNudzQ_v_UGEGMFYq0JvivwR6g0dNKs-MyZMxww&indpubnum=8693092939388569&atk=1arpjr78d5upddvtindeed_clk(this,'6618');20d1db3c7d973199falsefalsefalseIndia16 days ago",  : 
  duplicate subscripts for columns
> print(xmldataframe)
Error in print(xmldataframe) : object 'xmldataframe' not found

What am I doing wrong?

Parfait · Accepted Answer

In order to use xmlToDataFrame(), you need to first parse the XML document and then reference document and the repeated elements that will serve as your rows. Fortunately, the XML is not heavily nested to require data transformation/wrangling needs.

library("XML")

# PARSE DOCUMENT FROM URL (paste0 used to break up line for readability)
results <- xmlParse(paste0("http://api.indeed.com/ads/apisearch?publisher=8693092939388569",
                           "&q=data+scientist&sort=&radius=&st=&jt=&start=&limit=2000",
                           "&fromage=&filter=&latlong=1&co=in&chnl=&userip=1.2.3.4",
                           "&useragent=Mozilla/%2F4.0%28Firefox%29&v=2"), isURL=TRUE)

# CONVERT TO DATA FRAME ON  NODE
df <- xmlToDataFrame(nodes = getNodeSet(results, "//results/result"))

Screenshot output (25 obs of 19 vars):

Converting XML to dataframe in R studio

Answers (2)

Related Questions