Reputation: 1770
From the given string that is $codes I just want to have all language to language array, all code to code array and finally all family to family array , how can i do this in php? i have tried using dom , but its not possible any otherway would be appreciated, Thanks in advance.
<?php
$codes = '<pre>
LANGUAGE CODE LANGUAGE FAMILY
AFAR AA HAMITIC
ABKHAZIAN AB IBERO-CAUCASIAN
AFRIKAANS AF GERMANIC
AMHARIC AM SEMITIC
ARABIC AR SEMITIC
ASSAMESE AS INDIAN
AYMARA AY AMERINDIAN
AZERBAIJANI AZ TURKIC/ALTAIC
BASHKIR BA TURKIC/ALTAIC
BYELORUSSIAN BE SLAVIC
BULGARIAN BG SLAVIC
BIHARI BH INDIAN
BISLAMA BI [not given]
BENGALI;BANGLA BN INDIAN
TIBETAN BO ASIAN
BRETON BR CELTIC
CATALAN CA ROMANCE
CORSICAN CO ROMANCE
CZECH CS SLAVIC
WELSH CY CELTIC
DANISH DA GERMANIC
GERMAN DE GERMANIC
BHUTANI DZ ASIAN
GREEK EL LATIN/GREEK
ENGLISH EN GERMANIC
ESPERANTO EO INTERNATIONAL AUX.
SPANISH ES ROMANCE
ESTONIAN ET FINNO-UGRIC
BASQUE EU BASQUE
PERSIAN (farsi) FA IRANIAN
FINNISH FI FINNO-UGRIC
FIJI FJ OCEANIC/INDONESIAN
FAROESE FO GERMANIC
FRENCH FR ROMANCE
FRISIAN FY GERMANIC
IRISH GA CELTIC
SCOTS GAELIC GD CELTIC
GALICIAN GL ROMANCE
GUARANI GN AMERINDIAN
GUJARATI GU INDIAN
HAUSA HA NEGRO-AFRICAN
HEBREW HE SEMITIC [*Changed 1989 from original ISO 639:1988, IW]
HINDI HI INDIAN
CROATIAN HR SLAVIC
HUNGARIAN HU FINNO-UGRIC
ARMENIAN HY INDO-EUROPEAN (OTHER)
INTERLINGUA IA INTERNATIONAL AUX.
INTERLINGUE IE INTERNATIONAL AUX.
INUPIAK IK ESKIMO
INDONESIAN ID OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN]
ICELANDIC IS GERMANIC
ITALIAN IT ROMANCE
INUKTITUT IU [ ]
JAPANESE JA ASIAN
JAVANESE JV OCEANIC/INDONESIAN
GEORGIAN KA IBERO-CAUCASIAN
KAZAKH KK TURKIC/ALTAIC
GREENLANDIC KL ESKIMO
CAMBODIAN KM ASIAN
KANNADA KN DRAVIDIAN
KOREAN KO ASIAN
KASHMIRI KS INDIAN
KURDISH KU IRANIAN
KIRGHIZ KY TURKIC/ALTAIC
LATIN LA LATIN/GREEK
LINGALA LN NEGRO-AFRICAN
LAOTHIAN LO ASIAN
LITHUANIAN LT BALTIC
LATVIAN;LETTISH LV BALTIC
MALAGASY MG OCEANIC/INDONESIAN
MAORI MI OCEANIC/INDONESIAN
MACEDONIAN MK SLAVIC
MALAYALAM ML DRAVIDIAN
MONGOLIAN MN [not given]
MOLDAVIAN MO ROMANCE
MARATHI MR INDIAN
MALAY MS OCEANIC/INDONESIAN
MALTESE MT SEMITIC
BURMESE MY ASIAN
NAURU NA [not given]
NEPALI NE INDIAN
DUTCH NL GERMANIC
NORWEGIAN NO GERMANIC
OCCITAN OC ROMANCE
AFAN (OROMO) OM HAMITIC
ORIYA OR INDIAN
PUNJABI PA INDIAN
POLISH PL SLAVIC
PASHTO;PUSHTO PS IRANIAN
PORTUGUESE PT ROMANCE
QUECHUA QU AMERINDIAN
RHAETO-ROMANCE RM ROMANCE
KURUNDI RN NEGRO-AFRICAN
ROMANIAN RO ROMANCE
RUSSIAN RU SLAVIC
KINYARWANDA RW NEGRO-AFRICAN
SANSKRIT SA INDIAN
SINDHI SD INDIAN
SANGHO SG NEGRO-AFRICAN
SERBO-CROATIAN SH SLAVIC
SINGHALESE SI INDIAN
SLOVAK SK SLAVIC
SLOVENIAN SL SLAVIC
SAMOAN SM OCEANIC/INDONESIAN
SHONA SN NEGRO-AFRICAN
SOMALI SO HAMITIC
ALBANIAN SQ INDO-EUROPEAN (OTHER)
SERBIAN SR SLAVIC
SISWATI SS NEGRO-AFRICAN
SESOTHO ST NEGRO-AFRICAN
SUNDANESE SU OCEANIC/INDONESIAN
SWEDISH SV GERMANIC
SWAHILI SW NEGRO-AFRICAN
TAMIL TA DRAVIDIAN
TELUGU TE DRAVIDIAN
TAJIK TG IRANIAN
THAI TH ASIAN
TIGRINYA TI SEMITIC
TURKMEN TK TURKIC/ALTAIC
TAGALOG TL OCEANIC/INDONESIAN
SETSWANA TN NEGRO-AFRICAN
TONGA TO OCEANIC/INDONESIAN
TURKISH TR TURKIC/ALTAIC
TSONGA TS NEGRO-AFRICAN
TATAR TT TURKIC/ALTAIC
TWI TW NEGRO-AFRICAN
UIGUR UG [ ]
UKRAINIAN UK SLAVIC
URDU UR INDIAN
UZBEK UZ TURKIC/ALTAIC
VIETNAMESE VI ASIAN
VOLAPUK VO INTERNATIONAL AUX.
WOLOF WO NEGRO-AFRICAN
XHOSA XH NEGRO-AFRICAN
YIDDISH YI GERMANIC [*Changed 1989 from original ISO 639:1988, JI]
YORUBA YO NEGRO-AFRICAN
ZHUANG ZA [ ]
CHINESE ZH ASIAN
ZULU ZU NEGRO-AFRICAN
</pre>';
$doc= new DOMDocument();
$doc->loadHTML($codes);
$xmlL = simplexml_import_dom($doc);
$pathL = $xmlL->xpath('//pre');
print_r($pathL);
?>
Upvotes: 0
Views: 176
Reputation: 1569
I think you should take a look to the php's explode function.
With that you can first split by the "\n" character (to separate lines), and you get the first array. Then for each line you can explode by \t (supposing you have tabs separating your data), to get an array with 3 separate entries, and then push each of these arrays in the array you want.
Something like:
$codes_array = array();
foreach($line as explode("\n",$codes) ){
$codes_array[] = explode("\t",$line);
}
Upvotes: 1
Reputation: 1960
the list is obviously generated, so you'd have better luck fixing the generator, but if you're stuck with this one list, the below should parse it the way you want:
$langs_ar = array();
$codes_ar = array();
$families_ar = array();
foreach(preg_split('/[\r\n]+/', $codes) as $line)
{
if (preg_match('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/', $line, $matches))
{
$langs_ar[] = $matches[1];
$codes_ar[] = $matches[2];
$families_ar[] = $matches[3];
}
}
Oh, and instead of 3 arrays, I'd recommend one array storing hashes for the 3 fields instead; that or make your own objects with the 3 properties lang, code, and family.
Edit: a much shorter way to do the same is this:
preg_match_all('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/m', $codes, $matches, PREG_SET_ORDER);
var_dump($matches);
$matches is now an array of "objects" for all your lines where indexes:
just iterate over that to do whatever you want.
Upvotes: 1