animal
animal

Reputation: 1004

Nested Parsing in XPath using Pig

I am trying to parse xml file with nested tags using pig. I have below sample for xml.

  <Document>
    <medicationsInfo>
     <code>10160-0</code>
     <entryInfo> 
        <statusCode>completed</statusCode>
        <startTime>20110729</startTime>
        <endTime>20110822</endTime>
        <strengthValue>24</strengthValue>
        <strengthUnits>h</strengthUnits>
     </entryInfo> 
     <entryInfo>
        <statusCode>completed</statusCode>
        <startTime>20120130</startTime>
        <endTime>20120326</endTime>
        <strengthValue>12</strengthValue>
        <strengthUnits>h</strengthUnits>
     </entryInfo>
     <entryInfo>
        <statusCode>completed</statusCode>
        <startTime>20100412</startTime>
        <endTime>20110822</endTime>
        <strengthValue>8</strengthValue>
        <strengthUnits>d</strengthUnits>
     </entryInfo>  
    </medicationsInfo>
    <ProductInfo>
     <code>10160-0</code>
     <entryInfo> 
        <statusCode>completed</statusCode>
        <startTime>20110729</startTime>
        <endTime>20110822</endTime>
        <strengthValue>24</strengthValue>
        <strengthUnits>h</strengthUnits>
     </entryInfo> 
     <entryInfo>
        <statusCode>completed</statusCode>
        <startTime>20120130</startTime>
        <endTime>20120326</endTime>
        <strengthValue>12</strengthValue>
        <strengthUnits>h</strengthUnits>
     </entryInfo>
     <entryInfo>
        <statusCode>completed</statusCode>
        <startTime>20100412</startTime>
        <endTime>20110822</endTime>
        <strengthValue>8</strengthValue>
        <strengthUnits>d</strengthUnits>
     </entryInfo>  
    </ProductInfo>
   </Document>

I am writing below code to get entryinfo results of medicationsinfo but i am getting error.

Code:

Register /home/cloudera/piggybank-0.16.0.jar;
DEFINE XPathAll org.apache.pig.piggybank.evaluation.xml.XPathAll();
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
A =  LOAD '/home/cloudera/Parsed_CCD.xml' using org.apache.pig.piggybank.storage.XMLLoader('medicationsInfo/entryInfo') as (x:chararray);
B = FOREACH A GENERATE XPathAll(x, 'statusCode',false,true), XPathAll(x, 'medicationsInfo/code/code',false,true).$0, XPathAll(x,'strengthValue',false,true).$1;
DUMP B;

Error:

[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B

Expected Output:

completed 20110729 20110822 24 h
completed 20120130 20120326 12 h
completed 20100412 20110822  8 d

Upvotes: 0

Views: 700

Answers (1)

Rijul
Rijul

Reputation: 1445

Below code will generate expected output:

Register /home/cloudera/piggybank-0.16.0.jar;

DEFINE XPathAll org.apache.pig.piggybank.evaluation.xml.XPathAll();

--DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();

A =  LOAD 'home/cloudera/Parsed_CCD.xml' 
using org.apache.pig.piggybank.storage.XMLLoader('medicationsInfo') as (x:chararray);

B = FOREACH A GENERATE 
XPathAll(x, 'medicationsInfo/entryInfo/statusCode').$0, 
XPathAll(x, 'medicationsInfo/entryInfo/startTime').$0,
XPathAll(x, 'medicationsInfo/entryInfo/endTime').$0,
XPathAll(x, 'medicationsInfo/entryInfo/strengthValue').$0,
XPathAll(x, 'medicationsInfo/entryInfo/strengthUnits').$0;

C = FOREACH A GENERATE 
XPathAll(x, 'medicationsInfo/entryInfo/statusCode').$1, 
XPathAll(x, 'medicationsInfo/entryInfo/startTime').$1,
XPathAll(x, 'medicationsInfo/entryInfo/endTime').$1,
XPathAll(x, 'medicationsInfo/entryInfo/strengthValue').$1,
XPathAll(x, 'medicationsInfo/entryInfo/strengthUnits').$1;

D = FOREACH A GENERATE 
XPathAll(x, 'medicationsInfo/entryInfo/statusCode').$2, 
XPathAll(x, 'medicationsInfo/entryInfo/startTime').$2,
XPathAll(x, 'medicationsInfo/entryInfo/endTime').$2,
XPathAll(x, 'medicationsInfo/entryInfo/strengthValue').$2,
XPathAll(x, 'medicationsInfo/entryInfo/strengthUnits').$2;


BCD = UNION B,C,D;

DUMP BCD;

Upvotes: 0

Related Questions