Reputation: 1
I need to read lines from an XML file and parse them into fields. A line is defined as text starting with a < and ending with />. It may be a single line or multiple lines separated by CR/LF. Here is a typical line:
<Label Name="lblIncidentTypeContent" Increasable="true" Left="140" Top="60"
Width="146 SpeechField="IncidentType_V" TextAlign="MiddleLeft" WidthPixel="-180"
WidthPercent="50" />
Once I've read the line, I then need to parse it into fields such as Name, Left, Width, etc. I then want to output a CSV with the data in a particular order. Then read the next line until EOF.
It's been a long time since I did Perl (or any other kind of) programming. Any help is welcome.
Upvotes: 0
Views: 377
Reputation: 57640
Don't view XML as line-based data, as it isn't. Rather, use a good XML parser, of which Perl has plenty.
Do not use XML::Simple!
Its own documentation says it is deprecated:
The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended.
The major problems with this module are the large number of options and the arbitrary ways in which these options interact - often with unexpected results.
So we're gonna use XML::LibXML
module, which interfaces with the external libxml2
library from the GNOME project. This has the advantage that we can use XPath expressions to query our data. For reading from or writing to CSV, the Text::CSV
module should be used.
use strict; use warnings;
use XML::LibXML;
use Text::CSV;
# load the data
my $data = XML::LibXML->load_xml(IO => \*STDIN) or die "Can't parse the XML";
# prepare CSV output:
my $csv = Text::CSV->new({ binary => 1, escape_char => "\\", eol => "\n" });
# Text::CSV doesn't like bareword filehandles
open my $output, '>&:utf8', STDOUT or die "Can't dup STDOUT: $!";
my @cols = qw/ name left width /; # the column names in the CSV
my @attrs = qw/ Name Left Width /; # the corresponding attr names in the XML
# print the header
$csv->print($output, \@cols);
# extract data
for my $label ($data->findnodes('//Label')) {
my @fields = map { $label->getAttribute($_) } @attrs;
$csv->print($output, \@fields);
}
Test data (I took the liberty to close the value of the Width attr):
<foo>
<Label Name="lblIncidentTypeContent" Increasable="true" Left="140" Top="60"
Width="146" SpeechField="IncidentType_V" TextAlign="MiddleLeft" WidthPixel="-180"
WidthPercent="50" />
<Label Name="Another TypeContent" Increasable="true"
Width="123" SpeechField="IncidentType_V"
Left="41,42" Top="13"
TextAlign="TopLeft" WidthPixel="-180"
WidthPercent="50"
/>
</foo>
Output:
name,left,width
lblIncidentTypeContent,140,146
"Another TypeContent","41,42",123
Upvotes: 3
Reputation: 796
Well, this being Perl you have several ways to do it:
Upvotes: 1