XML parsing of attributes using R

Question

I've been trying to get into R and thought the best way is to come up with a project that I like and dig into it. So I wanted to analyze my texting habits. I managed to export my texts as an XML file in the following format:


    content of text
    another content of text

Now, what I would like to do is to extract the attributes "date" and "number" and the content of each message and create a data frame. My end-goal is to create a graph for each "number" and see how often I text that number.

After looking around, I found the XML package for R. I can extract the content of the message, but have not been able to get the attributes from the single message tag. Everything I found regarding attributes talked about nested tags like:


    1423813836987
    555-555

Would anybody point me to the right direction? Are there better way to do something like this? What I have so far is this:

doc = xmlRoot(xmlTreeParse("~/Desktop/data.xml"))
xml_data <- xmlToList(doc)

But it makes the attributes look funky.

Thank you in advance.

agstudy · Accepted Answer

Here an option using xpathSApply:

## create an xml Doc, replace the text by your file name
xx <- htmlParse('

    content of text
    another content of text
',asText=T)
## parsing 
data.frame(
  date=xpathSApply(xx,'//all/message',xmlGetAttr,'date'),
  number=xpathSApply(xx,'//all/message',xmlGetAttr,'number'),
  message=xpathSApply(xx,'//all/message',xmlValue))

##           date    number                 message
## 1 1423813836987 +15555555         content of text
## 2 1423813836987 +15555555 another content of text

XML parsing of attributes using R

Answers (2)

Related Questions