simplesimon
simplesimon

Reputation: 31

How to split a space separated file?

I am trying to import this:

http://en.wikipedia.org/wiki/List_of_countries_by_continent_%28data_file%29

which is of the format like:

AS AF AFG 004 Afghanistan, Islamic Republic of
EU AX ALA 248 Åland Islands
EU AL ALB 008 Albania, Republic of
AF DZ DZA 012 Algeria, People's Democratic Republic of
OC AS ASM 016 American Samoa
EU AD AND 020 Andorra, Principality of
AF AO AGO 024 Angola, Republic of
NA AI AIA 660 Anguilla

if i do

<? explode(" ",$data"); ?>

that works fine apart from countries with more than 1 word.

how can i split it so i get the first 4 bits of data (the chars/ints) and the 5th bit of data being whatever remains?

this is in php

thank you

Upvotes: 3

Views: 623

Answers (4)

BMBM
BMBM

Reputation: 16013

Maybe sscanf can also do what you need:

<?php
// in my example I loaded the data in an array line by line
$lines = file('sscanf_data.txt');

foreach($lines as $line) {
    $data = array();
    // define the format of the input string, assign the 
    // extracted data to an associative array
    sscanf($line, "%s %s %s %s %[^.]", 
        $data['col_1'], 
        $data['col_2'], 
        $data['col_3'], 
        $data['col_4'], 
        $data['col_5']);

    // dump array contents
    print_r($data);
}

Output:

Array
(
    [col_1] => AS
    [col_2] => AF
    [col_3] => AFG
    [col_4] => 004
    [col_5] => Afghanistan, Islamic Republic of

)
...

The good thing is that if you store the data in an associative array you already have field-value pairs for inserting them in the DB.

Upvotes: 0

Edward Dale
Edward Dale

Reputation: 30133

The explode function takes an optional limit parameter. Change your function call to:

<?php explode(" ", $data, 5); ?>

and you will get the country name as the last element in the array, containing spaces.

Upvotes: 11

Matthew Flaschen
Matthew Flaschen

Reputation: 284796

Using unpack:

$format = "A2cont/x/A2alpha2/x/A3alpha3/x/A3num/x/a*eng";
$line = "AS AF AFG 004 Afghanistan, Islamic Republic of";
$ar = unpack($format, $line);

It produces:

array (
  'cont' => 'AS',
  'alpha2' => 'AF',
  'alpha3' => 'AFG',
  'num' => '004',
  'eng' => 'Afghanistan, Islamic Republic of',
)

This has the advantage of producing an associative array (note the text before the slashes), and warning if the input is invalid.

Upvotes: 3

konradowy
konradowy

Reputation: 1570

You can use preg_match and your text will be in $match[5];

<?php
$str = 'AS AF AFG 004 Afghanistan, Islamic Republic of';
$chars = preg_match('/([A-Z]*)\ ([A-Z]*)\ ([A-Z]*)\ ([0-9]*)\ (.*)\ /', $str, $match);
print_r($match);
?>

Upvotes: 0

Related Questions