Ghgh Lhlh
Ghgh Lhlh

Reputation: 165

how to get value of a specified node in XML tree using R language

i have a xml file that includes rootNode and child node with attributes that handle values.

i am using the R language to work on the xml file.

what i need is to display the result of the employees that are in the department IT

how to display the ID or the name of the employees that are in the IT department?

i used this code:

print(getNodeSet(rootnode,"//EMPLOYEE/DEPT[@DEPT='IT']"))

where rootnode is the variable that handle the value : RECORDS

IT DID NOT WORK

xml file :

<RECORDS>
   <EMPLOYEE>
      <ID>1</ID>
      <NAME>Rick</NAME>
      <SALARY>623.3</SALARY>
      <STARTDATE>1/1/2012</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>2</ID>
      <NAME>Dan</NAME>
      <SALARY>515.2</SALARY>
      <STARTDATE>9/23/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>3</ID>
      <NAME>Michelle</NAME>
      <SALARY>611</SALARY>
      <STARTDATE>11/15/2014</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>4</ID>
      <NAME>Ryan</NAME>
      <SALARY>729</SALARY>
      <STARTDATE>5/11/2014</STARTDATE>
      <DEPT>HR</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>5</ID>
      <NAME>Gary</NAME>
      <SALARY>843.25</SALARY>
      <STARTDATE>3/27/2015</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>6</ID>
      <NAME>Nina</NAME>
      <SALARY>578</SALARY>
      <STARTDATE>5/21/2013</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>7</ID>
      <NAME>Simon</NAME>
      <SALARY>632.8</SALARY>
      <STARTDATE>7/30/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>8</ID>
      <NAME>Guru</NAME>
      <SALARY>722.5</SALARY>
      <STARTDATE>6/17/2014</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>

</RECORDS>

Upvotes: 2

Views: 484

Answers (1)

Prem
Prem

Reputation: 11955

Seems you need to modify getNodeSet as below.

getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']/NAME")


In case you want to have more than one column in the Output:

library(XML)
library(dplyr)

#sample data
xml_data <- xmlParse("<RECORDS>
  <EMPLOYEE><ID>1</ID><NAME>Rick</NAME><SALARY>623.3</SALARY><DEPT>IT</DEPT></EMPLOYEE>
  <EMPLOYEE><ID>2</ID><NAME>Dan</NAME><SALARY>515.2</SALARY><DEPT>Operations</DEPT></EMPLOYEE>
  <EMPLOYEE><ID>3</ID><NAME>Michelle</NAME><SALARY>611</SALARY><DEPT>IT</DEPT></EMPLOYEE>
  </RECORDS>")


df <- xmlToDataFrame(nodes=getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']")) %>%
  select(NAME, SALARY)
df

Output is:

      NAME SALARY
1     Rick  623.3
2 Michelle    611

(Edit - modified code to have more than one column in the output)

Upvotes: 1

Related Questions