Reputation: 43
I'm trying to parse a CSV file and convert it to XML. The .csv file consists of a list of entries, separated by commas. So, two sample entries look like this:
License,Date,Mileage
04-nh-pd,17-11-2020,30000
19-tg-jr,17-11-2020,36000
Expected output:
<?xml version="1.0" encoding="UTF-8" ?><ns1:ImportObjectMileage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types">
<ns1:ObjectMileage><ns1:object_code>04-nh-pd</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>30000</ns1:mileage><ns1:icode_mileagecause_ecode>KEUR</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
<ns1:ObjectMileage><ns1:object_code>19-tg-jr</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>36000</ns1:mileage><ns1:icode_mileagecause_ecode>KEUR</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
My code so far:
#!perl
use strict;
# Open the ch2_xml_users.csv file for input
open(CSV_FILE, "ch2_xmlusers.csv") || die "Can't open file: $!";
# Open the ch2_xmlusers.xml file for output
open(XML_FILE, ">ch2_xmlusers.xml") || die "Can't open file: $!";
# Print the initial XML header and the root element
print XML_FILE '<?xml version="1.0" encoding="UTF-8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types';
my $kenteken = "";
# The while loop to traverse through each line in users.csv
while(<CSV_FILE>) {
chomp; # Delete the new line char for each line
# Split each field, on the comma delimiter, into an array
my @fields = split(/,/, $_);
$kenteken .= <<"EOF";
<ns1:ObjectMileage><ns1:object_code>$fields[0]</ns1:object_code><ns1:mileagedate>$fields[1]</ns1:mileagedate><ns1:mileage>$fields[2]</ns1:mileage><ns1:icode_mileagecause_ecode>$fields[3]</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
EOF
}
print XML_FILE "\n".$kenteken."\n";
# Close all open files
close CSV_FILE;
close XML_FILE;
My Output so far:
<?xml version="1.0" encoding="UTF-8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types
<ns1:ObjectMileage><ns1:object_code>License</ns1:object_code><ns1:mileagedate>Date</ns1:mileagedate><ns1:mileage>Mileage</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
<ns1:ObjectMileage><ns1:object_code>04-nh-pd</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>30000</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
<ns1:ObjectMileage><ns1:object_code>19-tg-jr</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>36000</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
<ns1:ObjectMileage><ns1:object_code></ns1:object_code><ns1:mileagedate></ns1:mileagedate><ns1:mileage></ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
<ns1:ObjectMileage><ns1:object_code></ns1:object_code><ns1:mileagedate></ns1:mileagedate><ns1:mileage></ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
The first line below the header and the last 2 should not be displayed in the output. Also the empty lines between the data are not correct. Can somebody help me with my script ?
Upvotes: 2
Views: 346
Reputation: 3222
I have made below changes to your script, see if this works for you.
..types">
line
matches (=~
) with License,Date,Mileage
, then skip the line with next
statement.kentekens
one by one, writing the line content with required fields at the time of csv read operation itself.Below is the altered script:
use strict; use warnings;
no warnings 'uninitialized';
open my $CSV_FILE, "<", "ch2_xmlusers.csv" or die "Cannot open a file: $!";
open my $XML_FILE, ">", "ch2_xmlusers.xml" or die "Cannot open a file: $!";
print $XML_FILE '<?xml version="1.0" encoding="UTF-8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types">'."\n";
my $kenteken = "";
my $csv_header = <$CSV_FILE>;
while(<$CSV_FILE>) {
chomp;
my @fields = split ',', $_;
$kenteken = <<"EOF";
<ns1:ObjectMileage><ns1:object_code>$fields[0]</ns1:object_code><ns1:mileagedate>$fields[1]</ns1:mileagedate><ns1:mileage>$fields[2]</ns1:mileage><ns1:icode_mileagecause_ecode>$fields[3]</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
EOF
print $XML_FILE $kenteken;
}
close $CSV_FILE;
close $XML_FILE;
Result:
<?xml version="1.0" encoding="UTF-8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types">
<ns1:ObjectMileage><ns1:object_code>04-nh-pd</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>30000
</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
<ns1:ObjectMileage><ns1:object_code>19-tg-jr</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>36000</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
Upvotes: 1
Reputation: 67908
You add 2 newlines in your heredoc, and 2 more when you print it. If you don't want that many newlines, why not remove some of them?
As for your output, what you might consider is have the variable declared inside the loop, and print directly:
while (<>) {
...
my $kenteken = ....
print ...
}
That way each new line of input gets a fresh temp variable.
However, why use a temp variable when you can just skip that? You can use for example printf
like this:
printf XML_FILE "<ns1:ObjectMileage><ns1:object_code>%s</ns1:object_code><ns1:mileagedate>%s</ns1:mileagedate><ns1:mileage>%s</ns1:mileage><ns1:icode_mileagecause_ecode>%s</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>\n", @fields;
The usage is printf "%s", $var
, where %s
represents a placeholder for a string that is supplied by $var
. Note that I added a newline \n
to the end, and this is normally how you print a line.
The two lines at the end which have no values in them are probably blank lines in your input file. You would already know this if you had used use warnings
in your code. Since you did not, you were not warned about empty lines in your input, which would have looked like this:
Use of uninitialized value in concatenation (.) or string at ...
You can check the input file lines and skip empty lines to avoid that. For example:
while (<>) {
next unless /\S/; # skip lines without non-whitespace characters
Now then.... with all this said and done, this is not how you should do it. You should (probably) use a csv-module such as Text::CSV
to read your input file, and then use an xml-module to print it. I am not terribly familiar with these, but if you google, you should find some recommendations. I have heard some recommend XML::LibXML
. Don't ask a question asking for recommendations on modules, though, as that is off topic for stackoverflow. As noted in the comments, it might be fine to print simple XML like you have done.
Upvotes: 3