Sam Arul Raj T
Sam Arul Raj T

Reputation: 1770

separated strings into an array?

From the given string that is $codes I just want to have all language to language array, all code to code array and finally all family to family array , how can i do this in php? i have tried using dom , but its not possible any otherway would be appreciated, Thanks in advance.

<?php
 $codes = '<pre>
 LANGUAGE      CODE     LANGUAGE FAMILY

AFAR            AA     HAMITIC
ABKHAZIAN       AB     IBERO-CAUCASIAN
AFRIKAANS       AF     GERMANIC
AMHARIC         AM     SEMITIC
ARABIC          AR     SEMITIC
ASSAMESE        AS     INDIAN
AYMARA          AY     AMERINDIAN
AZERBAIJANI     AZ     TURKIC/ALTAIC
BASHKIR         BA     TURKIC/ALTAIC
BYELORUSSIAN    BE     SLAVIC
BULGARIAN       BG     SLAVIC
BIHARI          BH     INDIAN
BISLAMA         BI     [not given]
BENGALI;BANGLA  BN     INDIAN
TIBETAN         BO     ASIAN
BRETON          BR     CELTIC
CATALAN         CA     ROMANCE
CORSICAN        CO     ROMANCE
CZECH           CS     SLAVIC
WELSH           CY     CELTIC
DANISH          DA     GERMANIC
GERMAN          DE     GERMANIC
BHUTANI         DZ     ASIAN
GREEK           EL     LATIN/GREEK
ENGLISH         EN     GERMANIC
ESPERANTO       EO     INTERNATIONAL AUX.
SPANISH         ES     ROMANCE
ESTONIAN        ET     FINNO-UGRIC
BASQUE          EU     BASQUE
PERSIAN (farsi) FA     IRANIAN
FINNISH         FI     FINNO-UGRIC
FIJI            FJ     OCEANIC/INDONESIAN
FAROESE         FO     GERMANIC
FRENCH          FR     ROMANCE
FRISIAN         FY     GERMANIC
IRISH           GA     CELTIC
SCOTS GAELIC    GD     CELTIC
GALICIAN        GL     ROMANCE
GUARANI         GN     AMERINDIAN
GUJARATI        GU     INDIAN
HAUSA           HA     NEGRO-AFRICAN
HEBREW          HE     SEMITIC [*Changed 1989 from original ISO 639:1988, IW] 
HINDI           HI     INDIAN
CROATIAN        HR     SLAVIC
HUNGARIAN       HU     FINNO-UGRIC
ARMENIAN        HY     INDO-EUROPEAN (OTHER)
INTERLINGUA     IA     INTERNATIONAL AUX.
INTERLINGUE     IE     INTERNATIONAL AUX.
INUPIAK         IK     ESKIMO
INDONESIAN      ID     OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN] 
ICELANDIC       IS     GERMANIC
ITALIAN         IT     ROMANCE
INUKTITUT       IU     [        ]
JAPANESE        JA     ASIAN
JAVANESE        JV     OCEANIC/INDONESIAN
GEORGIAN        KA     IBERO-CAUCASIAN
KAZAKH          KK     TURKIC/ALTAIC
GREENLANDIC     KL     ESKIMO
CAMBODIAN       KM     ASIAN
KANNADA         KN     DRAVIDIAN
KOREAN          KO     ASIAN
KASHMIRI        KS     INDIAN
KURDISH         KU     IRANIAN
KIRGHIZ         KY     TURKIC/ALTAIC
LATIN           LA     LATIN/GREEK
LINGALA         LN     NEGRO-AFRICAN
LAOTHIAN        LO     ASIAN
LITHUANIAN      LT     BALTIC
LATVIAN;LETTISH LV     BALTIC
MALAGASY        MG     OCEANIC/INDONESIAN
MAORI           MI     OCEANIC/INDONESIAN
MACEDONIAN      MK     SLAVIC
MALAYALAM       ML     DRAVIDIAN
MONGOLIAN       MN     [not given]
MOLDAVIAN       MO     ROMANCE
MARATHI         MR     INDIAN
MALAY           MS     OCEANIC/INDONESIAN
MALTESE         MT     SEMITIC
BURMESE         MY     ASIAN
NAURU           NA     [not given]
NEPALI          NE     INDIAN
DUTCH           NL     GERMANIC
NORWEGIAN       NO     GERMANIC
OCCITAN         OC     ROMANCE
AFAN (OROMO)    OM     HAMITIC
ORIYA           OR     INDIAN
PUNJABI         PA     INDIAN
POLISH          PL     SLAVIC
PASHTO;PUSHTO   PS     IRANIAN
PORTUGUESE      PT     ROMANCE
QUECHUA         QU     AMERINDIAN
RHAETO-ROMANCE  RM     ROMANCE
KURUNDI         RN     NEGRO-AFRICAN
ROMANIAN        RO     ROMANCE
RUSSIAN         RU     SLAVIC
KINYARWANDA     RW     NEGRO-AFRICAN
SANSKRIT        SA     INDIAN
SINDHI          SD     INDIAN
SANGHO          SG     NEGRO-AFRICAN
SERBO-CROATIAN  SH     SLAVIC
SINGHALESE      SI     INDIAN
SLOVAK          SK     SLAVIC
SLOVENIAN       SL     SLAVIC
SAMOAN          SM     OCEANIC/INDONESIAN
SHONA           SN     NEGRO-AFRICAN
SOMALI          SO     HAMITIC
ALBANIAN        SQ     INDO-EUROPEAN (OTHER)
SERBIAN         SR     SLAVIC
SISWATI         SS     NEGRO-AFRICAN
SESOTHO         ST     NEGRO-AFRICAN
SUNDANESE       SU     OCEANIC/INDONESIAN
SWEDISH         SV     GERMANIC
SWAHILI         SW     NEGRO-AFRICAN
TAMIL           TA     DRAVIDIAN
TELUGU          TE     DRAVIDIAN
TAJIK           TG     IRANIAN
THAI            TH     ASIAN
TIGRINYA        TI     SEMITIC
TURKMEN         TK     TURKIC/ALTAIC
TAGALOG         TL     OCEANIC/INDONESIAN
SETSWANA        TN     NEGRO-AFRICAN
TONGA           TO     OCEANIC/INDONESIAN
TURKISH         TR     TURKIC/ALTAIC
TSONGA          TS     NEGRO-AFRICAN
TATAR           TT     TURKIC/ALTAIC
TWI             TW     NEGRO-AFRICAN
UIGUR           UG     [       ]
UKRAINIAN       UK     SLAVIC
URDU            UR     INDIAN
UZBEK           UZ     TURKIC/ALTAIC
VIETNAMESE      VI     ASIAN
VOLAPUK         VO     INTERNATIONAL AUX.
WOLOF           WO     NEGRO-AFRICAN
XHOSA           XH     NEGRO-AFRICAN
YIDDISH         YI     GERMANIC [*Changed 1989 from original ISO 639:1988, JI] 
YORUBA          YO     NEGRO-AFRICAN
ZHUANG          ZA     [       ]
CHINESE         ZH     ASIAN
ZULU            ZU     NEGRO-AFRICAN
</pre>';

$doc=   new DOMDocument();
$doc->loadHTML($codes);

$xmlL = simplexml_import_dom($doc);
$pathL = $xmlL->xpath('//pre');
print_r($pathL);

?>

Upvotes: 0

Views: 176

Answers (2)

kappa
kappa

Reputation: 1569

I think you should take a look to the php's explode function.

With that you can first split by the "\n" character (to separate lines), and you get the first array. Then for each line you can explode by \t (supposing you have tabs separating your data), to get an array with 3 separate entries, and then push each of these arrays in the array you want.

Something like:

$codes_array = array();
foreach($line as explode("\n",$codes) ){
    $codes_array[] = explode("\t",$line);
}

Upvotes: 1

Timoth&#233;e Groleau
Timoth&#233;e Groleau

Reputation: 1960

the list is obviously generated, so you'd have better luck fixing the generator, but if you're stuck with this one list, the below should parse it the way you want:

$langs_ar = array();
$codes_ar = array();
$families_ar = array();

foreach(preg_split('/[\r\n]+/', $codes) as $line)
{   
    if (preg_match('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/', $line, $matches))
    {   
        $langs_ar[] = $matches[1];
        $codes_ar[] = $matches[2];
        $families_ar[] = $matches[3];
    }                                                                                                                                             
}

Oh, and instead of 3 arrays, I'd recommend one array storing hashes for the 3 fields instead; that or make your own objects with the 3 properties lang, code, and family.

Edit: a much shorter way to do the same is this:

preg_match_all('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/m', $codes, $matches, PREG_SET_ORDER);
var_dump($matches);

$matches is now an array of "objects" for all your lines where indexes:

  • 0 is the full line
  • 1 is the language
  • 2 is the code
  • 3 is the family

just iterate over that to do whatever you want.

Upvotes: 1

Related Questions