tctc91
tctc91

Reputation: 1363

Extract table data from Wikipedia and convert into an XML document

Page: http://en.wikipedia.org/wiki/ISO_4217#Active_codes

Is it possible to extract every:

and if possible save into an XML document like so:

<currency>
    <AED>
        <curr>United Arab Emirates dirham</curr>
        <loc>United Arab Emirates</loc>
    </AED>
</currency>
<currency>
    <AFN>
        <curr>Afghan afghani</curr>
        <loc>Afghanistan</loc>
    </AFN>
</currency>

I'm not sure if this helps but I found that you can convert the wiki page into a somewhat XML structure:

http://en.wikipedia.org/wiki/Special:Export/ISO_4217#Active_codes

Thanks.

Upvotes: 2

Views: 1054

Answers (1)

DfKimera
DfKimera

Reputation: 2146

The table is created, and thus available, on the wiki format: http://en.wikipedia.org/w/index.php?title=ISO_4217&action=edit&section=4

You could write a script to parse the wiki format into an array, and build a XML from that. Try splitting the string by newlines (using explode, for example), and then subsequently splitting each line by ||, which separates the table columns.

Something like this:

$currencyList = array();
$source = "<insert wikipedia table code here>";

$rows = explode("\n", $source); // split the table in rows

foreach($rows as $row) {

    if(strlen(trim($row)) < 0) { continue; } // ignore empty rows
    if(trim($row) == "|-") { continue; } // ignore table line separators

    $row = substr($row, 2); // remove the "| " from the beginning of each row

    $cols = explode("||", $row); // split the row in columns

    $currency = array( // clean data and store in associative array
         'code' => trim($cols[0]),
         'number' => trim($cols[1]),
         'digits_after_decimal' => trim($cols[2]),
         'name' => trim($cols[3])
    );

    array_push($currencyList, $currency); // add the read currency to the list

}

var_dump($currencyList); // $currencyList now has a list of associative arrays with your data.

To build your XML, you can try PHP's SimpleXML.

Upvotes: 2

Related Questions