user760857
user760857

Reputation: 31

How to build an xml tree using an event based parsers in perl for a huge data?

I have an XML file like this:

    <Nodes><Node>
 <NodeName>Company</NodeName>
 <File>employee_details.csv</File>
 <data>employee_data.txt</data>
<Node>
     <NodeName>dummy</NodeName>
     <File>employee_details1.csv</File>
     <data>employee_data1.txt</data>
    </Node>
    </Node>
</Nodes>

    #Contents of employee_data.txt
Empname,Empcode,EmpSal:Currency,Empaddr
#Contents of employee_details.csv (like this huge data)
Alex,A001,1000:USD,Bangalore
Aparna,B001,1000:RUBEL,Bombay
#Contents of employee_data1.txt
phone,fax
#Contents of employee_details1.csv (like this huge data)
44568889,123345656
23232323,454545757

Output:

<Company>
<Empname>Alex</Empname>
<Empcode>A001</Empcode>
<EmpSal=USD>1000</EmpSal>
<Empaddr>Bangalore</Empaddr>
<phone>44568889</phone>
<fax>123345656</fax>
</Company>
<Company>
<Empname>Aparna</Empname>
<Empcode>B001</Empcode>
<EmpSal=RUBEL>1000</EmpSal>
<Empaddr>Bombay</Empaddr>
<phone>23232323</phone>
<fax>454545757</fax>

I want to build an XML tree with Sax parser but I am not able to understand how to traverse across all the nodes and create an event.

I should get the above output?

How can I do it in Perl?

Upvotes: 2

Views: 257

Answers (3)

Mandar Pande
Mandar Pande

Reputation: 12984

.pl file my $factory = XML::SAX::ParserFactory->new(); my $parser = $factory->parser( Handler =>sax_handler->new(arguments_to parse));

sax_handler.pm su new() { //nothing as such ! my ($type); return bless {}, $type; } //follwong 2 methods are important sub start_element { my ($self, $element) = @_;

#attributes of comment tag...m:text is tag
if( $element->{Name} eq "m:text")
{
$name=$element->{Attributes}->{'{}name'}->{'Value'};
}

}

//m:reviewID is tag in u r xml ! sub end_element { my ($self, $element) = @_;

#write down all tags...& print them or manipulate them
if( $element->{Name} eq "m:reviewID"){

} }

Upvotes: 2

Mandar Pande
Mandar Pande

Reputation: 12984

Well SAX Parser is slightly different from other parsing techniques. Here you need to write your handler [ perl module]. module must contains following things -> 1. constructor. 2. subroutine start_element 3.end_element. You can manage events inside the subroutines like this [for tag] -->if( $element->{Name} eq "mail_id"){ $user_mail_id=$self->get_text();}

Upvotes: 1

mirod
mirod

Reputation: 16171

It looks to me that the CSV files can be huge, not the XML one. So really there is no need to use a SAX parser. The XML is used only to give you the location of 4 files. 2 of those files (the .txt ones) are small, they only contain a list of fields, and the last 2 files can be big. Those are the CSV file.

You should use Text::CSV_XS to parse those 2 huge file. You can then output the XML using plain print (just make sure you escape the text and pay attention to the encoding (BTW in your sample output <EmpSal=USD> is not well-formed XML, the attribute value needs to be quoted: <EmpSal="USD">). An other options is XML::Writer, which will take care of escaping and quoting for you. I don't think generating SAX events and passing them to a SAX writer makes sense in this case, it would be more complex and probably slower than the other options.

Upvotes: 1

Related Questions