Joe Burton
Joe Burton

Reputation: 89

Converting a massive CSV to a specific XML format

I have a pretty massive weight table that is in CSV format that I need to convert to XML. The problem I'm running into is the XML format I need it to go into doesn't mesh nicely with the way the CSV is laid out.

Here is a little example of how CSV is laid out:

WEIGHT BREAK (LBS),WEIGHT BREAK (OZ),CA,AD,AE,AF,AG    
0.06,1,4.14,3.48,3.51,3.62,3.55    
0.13,2,4.59,3.97,4.04,4.24,4.1    
0.19,3,5.04,4.44,4.54,4.86,4.66

And I need to convert this into the following for each country:

<Module code='wtbship' feature='shipping'>
    <Method_Add>  
        <Method>CA</Method>
        <Handling>0.00</Handling>
        <Range>
            <Ceiling>0.06</Ceiling>
            <Rate>4.14</Rate>
        </Range>
        <Range>
            <Ceiling>0.13</Ceiling>
            <Rate>4.59</Rate>
        </Range>
        <Range>
            <Ceiling>0.19</Ceiling>
            <Rate>5.04</Rate>
        </Range>
    </Method_Add>
</Module>

The first element in each row is the weight ceiling I need to grab, and then each country has its own element in each row that contains the price for that weight ceiling.

There are 227 countries in the CSV I'm working with. I know I could do this if I had an array for every country, and as I loop through the CSV I could push the weight break and the price to each respective country array, but that seems like a lot of work and I'm trying to find an easier way.

I tried doing this with perl initially but switched to PHP because I'm a little more familiar with it. It's almost as though I need to loop through the column structure as opposed to the rows but that doesn't seem possible to do.

Can anyone provide any suggestions of what could work, or point me in the right direction?

Upvotes: 0

Views: 145

Answers (2)

hakre
hakre

Reputation: 198199

I think the concept you have is already pretty well. You're only a bit limited as you think that there might be an easier way (tm) which you for sure want to have.

There perhaps even is, but first of all I think you already nailed the problem. This is important, because you can then try to do the solution as efficient as you think it could be. It never works the other way round: Hey I need easy solution. 1.) Easy 2.) Solution. No, it's first you need a solution, then you can think about to make it easy. And you only need to think about making it easy, when you found out that that the solution you have is not easy enough.

So easiness is purely relative to your solution.

So let's look together over what you've got so far:

There are 227 countries in the CSV I'm working with. I know I could do this if I had an array for every country, and as I loop through the CSV I could push the weight break and the price to each respective country array, but that seems like a lot of work and I'm trying to find an easier way.

First consider the limits to find out whether or not your solution is already a solution or only in theory. You write there are 227 countries. Question is, how many rows (records) there are in your CSV file? Is it only three as you gave the example in the question (I guess not) or is it more. How many more? 1 000? 10 000? 100 000? ...?.

As you outline it you've got a product here: countries * records. So if you've got let's say 250 countries and you've got 100 000 records this will be 25 000 000 entries in an array. Depending on how much memory one array entry will take, you would hit a memory limit. So this perhaps might even be the most easy solution you've already got it wouldn't work so would not be a solution at all.

This means despite you're asking for an easier solution you're in reality perhaps even asking for a more complicated solution because your solution so far was too easy so that it even didn't work at all.

Which one that would be and how it can be done in an easy fashion depends a lot on the requirements that are not yet clear from your question like:

  • Is there a certain order of elements that need to be taken based on country data or order of records in the original CSV file?
  • Is the input file expected to change while the transaction progresses or is it save to assume that the CSV file won't change at all?

Depending on these details the algorithm of the solution can differ greatly. Especially if there is an order to preserve / achieve.

For example, is the following XML considered to be valid as well? It technically encodes the same information but might not be suitable:

<Module code='wtbship' feature='shipping'>
    <Method_Add>
        <Method>CA</Method>
        <Handling>0.00</Handling>
        <Range>
            <Ceiling>0.06</Ceiling>
            <Rate>4.14</Rate>
        </Range>
    </Method_Add>
    <Method_Add>
        <Method>CA</Method>
        <Handling>0.00</Handling>
        <Range>
            <Ceiling>0.13</Ceiling>
            <Rate>4.59</Rate>
        </Range>
    </Method_Add>
    <Method_Add>
        <Method>CA</Method>
        <Handling>0.00</Handling>
        <Range>
            <Ceiling>0.19</Ceiling>
            <Rate>5.04</Rate>
        </Range>
    </Method_Add>
</Module>

But if suitable, it might allow to process one record (row) of the CSV file after the other. Which could greatly simplify the code.

This would allow you to encode an iteration that returns all value-sets in value, then country order:

iteration #0: Array
(
    [country] => CA
    [handling] => 0.00
    [ceiling] => 0.06
    [rate] => 4.14
)
iteration #1: Array
(
    [country] => AD
    [handling] => 0.00
    [ceiling] => 0.06
    [rate] => 3.48
)
iteration #2: Array
(
    [country] => AE
    [handling] => 0.00
    [ceiling] => 0.06
    [rate] => 3.51
)

Which could be as easy as implementing a CsvParser as a Generator which usage is pretty straight forward:

$parser = new CsvParser('/path/to/file.csv');

foreach($parser as $record) {
    print_r($record);
}

(output would be similar to the iteration #0 to iteration #1 example above).

As written before, this depends on whether or not the order of the data would suffice. So only an example that this can make a difference - in positive as in negative.

And this example does not cover creating the XML. In that case, this can be easily added with an XmlEncoder:

$parser = new CsvParser('/path/to/file.csv');
$encoder = new XmlEncoder('/path/to/file.xml', $parser);
$encoder->encode();

As this little example shows, it might be even already a quite universal solution as it breaks down the job into two parts:

first is to parse the CSV, second is to encode the XML. There are two objects which take care of that. Interface is with the file-system.

The contract between the parser and the encoder is a) the Traversable interface and b) the format of each record (here an array with four keys).

The output then could be like this:

<?xml version="1.0"?>
<Module>
 <Method_Add>
  <Method>CA</Method>
  <Handling>0.00</Handling>
  <Range>
   <Ceiling>0.06</Ceiling>
   <Rate>4.14</Rate>
  </Range>
 </Method_Add>
 <Method_Add>
  <Method>AD</Method>
  <Handling>0.00</Handling>
  <Range>
   <Ceiling>0.06</Ceiling>
   <Rate>3.48</Rate>
  </Range>
 </Method_Add>
 <Method_Add>
  <Method>AE</Method>
  <Handling>0.00</Handling>
  <Range>
   <Ceiling>0.06</Ceiling>
   <Rate>3.51</Rate>
  </Range>
 </Method_Add>
...

Next to that this might be the wrong order of elements, the example for sure also lacks of some details (like for example the attributes of the Module document element) but it hopefully shows that it's not so easy to answer a question asking for an easier way (tm) because it can not be asked that easily for it.

More of importance is that you verify your solution does actually fit your needs and after you've tested that you make clear at which point you need to obtain a solution that it is that much more complicated that you couldn't easily find the solution for it ;).

You can find a working example off-site here: online demo.

Upvotes: 1

jerdiggity
jerdiggity

Reputation: 3675

Maybe something like this will at least get you pointed in the right direction:

<?php
// Change this to the actual file name of your csv file:
$file = 'csv_file.csv';

print_csv_as_xml($file);

function print_csv_as_xml($file) {
  $xml_output = new SimpleXMLElement("<Module code=\"wtbship\" feature=\"shipping\"></Module>");
  $csv = csv_to_array($file);
  if (!empty($csv)) {
    array_to_xml($csv, $xml_output);
  }
  header('Content-type: application/xml');
  $xml = $xml_output->asXML();
  print $xml;
}

/**
 * @function
 *  Fixes invalid XML characters.
 */
function fix_xml_chars($value) {
  $ret = "";
  $current;
  if (empty($value)) {
    return $ret;
  }
  $length = strlen($value);
  for ($i = 0; $i < $length; $i++) {
    $current = ord($value{$i});
    if (($current == 0x9) || ($current == 0xA) || ($current == 0xD) ||
         (($current >= 0x20) && ($current <= 0xD7FF)) ||
         (($current >= 0xE000) && ($current <= 0xFFFD)) ||
         (($current >= 0x10000) && ($current <= 0x10FFFF))) {
      $ret .= chr($current);
    }
    else {
      $ret .= " ";
    }
  }
  $out = $ret;
  return $out;
}

/**
 * @function
 *  Converts an array to XML.
 */
function array_to_xml($fields, &$xml_fields) {
  if (is_array($fields) && !empty($fields)) {
    foreach ($fields as $key => $value) {
      if (is_array($value)) {
        if (!is_numeric($key)) {
          $subnode = $xml_fields->addChild("$key");
          array_to_xml($value, $subnode);
        }
        else {
          // Set each second-level element as Method_Add 
          $subnode = $xml_fields->addChild("Method_Add");
          array_to_xml($value, $subnode);
        }
      }
      else {
        $xml_fields->addChild("$key", fix_xml_chars(htmlspecialchars("$value")));
      }
    }
  }
}

/**
 * @function
 *  Converts a CSV file to an associative array.
 */
function csv_to_array($file) {
  $rows = array();
  $headers = array();
  if (file_exists($file) && is_readable($file)) {
    $handle = fopen($file, 'r');
    while (!feof($handle)) {
      $row = fgetcsv($handle, 1024, ',', '"');
      if (empty($headers)) {
        $headers = $row;
      }
      elseif (is_array($row)) {
        array_splice($row, count($headers));
        $rows[] = array_combine($headers, $row);
      }
    }
    fclose($handle);
  }
  $out = $rows;
  return $out;  
}

Upvotes: 0

Related Questions