Toine de L
Toine de L

Reputation: 43

Converting CSV file to XML with Perl

I'm trying to parse a CSV file and convert it to XML. The .csv file consists of a list of entries, separated by commas. So, two sample entries look like this:

License,Date,Mileage
04-nh-pd,17-11-2020,30000
19-tg-jr,17-11-2020,36000

Expected output:

<?xml version="1.0" encoding="UTF-8" ?><ns1:ImportObjectMileage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types">
<ns1:ObjectMileage><ns1:object_code>04-nh-pd</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>30000</ns1:mileage><ns1:icode_mileagecause_ecode>KEUR</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
<ns1:ObjectMileage><ns1:object_code>19-tg-jr</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>36000</ns1:mileage><ns1:icode_mileagecause_ecode>KEUR</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>

My code so far:

#!perl
use strict;
# Open the ch2_xml_users.csv file for input
open(CSV_FILE, "ch2_xmlusers.csv") || die "Can't open file: $!";

# Open the ch2_xmlusers.xml file for output
open(XML_FILE, ">ch2_xmlusers.xml") || die "Can't open file: $!";

# Print the initial XML header and the root element
print XML_FILE '<?xml version="1.0" encoding="UTF-8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types';


my $kenteken = "";
# The while loop to traverse through each line in users.csv
while(<CSV_FILE>) {
    chomp; # Delete the new line char for each line
    # Split each field, on the comma delimiter, into an array
    my @fields = split(/,/, $_);
  $kenteken .= <<"EOF";
    <ns1:ObjectMileage><ns1:object_code>$fields[0]</ns1:object_code><ns1:mileagedate>$fields[1]</ns1:mileagedate><ns1:mileage>$fields[2]</ns1:mileage><ns1:icode_mileagecause_ecode>$fields[3]</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
    
EOF
}

print XML_FILE "\n".$kenteken."\n";


# Close all open files
close CSV_FILE;
close XML_FILE;
 

My Output so far:

<?xml version="1.0" encoding="UTF-8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types
    <ns1:ObjectMileage><ns1:object_code>License</ns1:object_code><ns1:mileagedate>Date</ns1:mileagedate><ns1:mileage>Mileage</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
    
    <ns1:ObjectMileage><ns1:object_code>04-nh-pd</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>30000</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
    
    <ns1:ObjectMileage><ns1:object_code>19-tg-jr</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>36000</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
    
    <ns1:ObjectMileage><ns1:object_code></ns1:object_code><ns1:mileagedate></ns1:mileagedate><ns1:mileage></ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
    
    <ns1:ObjectMileage><ns1:object_code></ns1:object_code><ns1:mileagedate></ns1:mileagedate><ns1:mileage></ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>
    


The first line below the header and the last 2 should not be displayed in the output. Also the empty lines between the data are not correct. Can somebody help me with my script ?

Upvotes: 2

Views: 346

Answers (2)

vkk05
vkk05

Reputation: 3222

I have made below changes to your script, see if this works for you.

  1. Always do file operation use of lexical filehandles.
  2. xml header line close with ..types">
  3. There are couple of ways to skip the header of CSV file:
    3.1 get rid of the pattern match for the header by reading one line into void context above the loop (as @simbabque mentioned in the comment).
    3.2 If the CSV file line matches (=~) with License,Date,Mileage, then skip the line with next statement.
  4. Instead of concatenating kentekens one by one, writing the line content with required fields at the time of csv read operation itself.

Below is the altered script:

use strict; use warnings;

no warnings 'uninitialized';

open my $CSV_FILE, "<", "ch2_xmlusers.csv" or die "Cannot open a file: $!";
open my $XML_FILE, ">", "ch2_xmlusers.xml" or die "Cannot open a file: $!";

print $XML_FILE '<?xml version="1.0" encoding="UTF-8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types">'."\n";

my $kenteken = "";
my $csv_header = <$CSV_FILE>;

while(<$CSV_FILE>) {
    chomp; 
    my @fields = split ',', $_;
    $kenteken = <<"EOF";
<ns1:ObjectMileage><ns1:object_code>$fields[0]</ns1:object_code><ns1:mileagedate>$fields[1]</ns1:mileagedate><ns1:mileage>$fields[2]</ns1:mileage><ns1:icode_mileagecause_ecode>$fields[3]</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>   
EOF
    print $XML_FILE $kenteken;
}
close $CSV_FILE;
close $XML_FILE;

Result:

<?xml version="1.0" encoding="UTF-8" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.co-maker.nl/LeaseOffice/types">
<ns1:ObjectMileage><ns1:object_code>04-nh-pd</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>30000
</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>   
<ns1:ObjectMileage><ns1:object_code>19-tg-jr</ns1:object_code><ns1:mileagedate>17-11-2020</ns1:mileagedate><ns1:mileage>36000</ns1:mileage><ns1:icode_mileagecause_ecode></ns1:icode_mileagecause_ecode></ns1:ObjectMileage>   

Upvotes: 1

TLP
TLP

Reputation: 67908

You add 2 newlines in your heredoc, and 2 more when you print it. If you don't want that many newlines, why not remove some of them?

As for your output, what you might consider is have the variable declared inside the loop, and print directly:

while (<>) {
    ...
    my $kenteken = ....
    print ...
}

That way each new line of input gets a fresh temp variable.

However, why use a temp variable when you can just skip that? You can use for example printf like this:

printf XML_FILE "<ns1:ObjectMileage><ns1:object_code>%s</ns1:object_code><ns1:mileagedate>%s</ns1:mileagedate><ns1:mileage>%s</ns1:mileage><ns1:icode_mileagecause_ecode>%s</ns1:icode_mileagecause_ecode></ns1:ObjectMileage>\n", @fields;

The usage is printf "%s", $var, where %s represents a placeholder for a string that is supplied by $var. Note that I added a newline \n to the end, and this is normally how you print a line.

The two lines at the end which have no values in them are probably blank lines in your input file. You would already know this if you had used use warnings in your code. Since you did not, you were not warned about empty lines in your input, which would have looked like this:

Use of uninitialized value in concatenation (.) or string at ...

You can check the input file lines and skip empty lines to avoid that. For example:

while (<>) {
    next unless /\S/;   # skip lines without non-whitespace characters

Now then.... with all this said and done, this is not how you should do it. You should (probably) use a csv-module such as Text::CSV to read your input file, and then use an xml-module to print it. I am not terribly familiar with these, but if you google, you should find some recommendations. I have heard some recommend XML::LibXML. Don't ask a question asking for recommendations on modules, though, as that is off topic for stackoverflow. As noted in the comments, it might be fine to print simple XML like you have done.

Upvotes: 3

Related Questions