Ashley
Ashley

Reputation: 413

extracting multiple values from xml using R

I am having a hard time extracting values from an XML file. I would like to take the average of each "length" (value between the <length> tags) for each hour of the day. In this XML file, all of the data comes from the same day: 2013-11-28

An example is shown below:

 <root>
        <item>
            <time>2013-11-28T00:00:00-05:00</time>
            <day>2013-11-28</day>
            <length>150</length>
        </item>
        <item>
            <time>2010-11-28T00:15:00-05:00</time>
            <day>2010-11-28</day>
            <length>200</length>
        </item>
        <item>
            <time>2010-11-28T00:30:00-05:00</time>
            <day>2010-11-28</day>
            <length>127.83</length>
        </item>
</root>

I would like the output to look something like this:

   hour         average_length
12:00-12:59     some_average
1:00-1:59       some_average
2:00-2:59       some_average

Thank you!

Upvotes: 0

Views: 767

Answers (1)

GGamba
GGamba

Reputation: 13680

How to read and format xml

Using the xml2 package and assuming text as in your example, we use

 xml_obj <- read_xml(text)

to create a xml object.
We can navigate in this objects using the various functions in the library that you can read about here. In this particular case we want to find all the elements of types time and length and then bind them in a data.frame.

# Find all elements of type time
times <- xml_find_all(xml_obj, '//time') %>% xml_text()
# Find all elements of type length
lengths <- xml_find_all(xml_obj, '//length') %>% xml_text()

# Merge the two to create the final data.frame
final <- data.frame(time = times, length = lengths)

Hope this helps

Upvotes: 1

Related Questions