Reputation: 963
I have asked several questions regarding this and I have tried many different things, but I am not completely happy with it. I have a lot of data in the following format
3*O#AA6160 F7 A7 P7 J7 R7 D7 I7 Y7 LHRMIA 1040 1455 * 744 0E
B7 H0 W0 K0 M0 L0 V0 G0 S0 Q0 N0 O0
The spaces you see on the second row are there by default. Essentially, from that string I am trying to get the following
$flightNumber = AA6160;
$from = LHR;
$to = MIA;
$other = 1040 1455 * 744 0E;
$seats = array(
"F" => 7,
"A" => 7,
"P" => 7,
"J" => 7,
"R" => 7,
"D" => 7,
"I" => 7,
"Y" => 7,
"B" => 7,
"H" => 0,
"W" => 0,
"K" => 0,
"M" => 0,
"L" => 0,
"V" => 0,
"G" => 0,
"S" => 0,
"Q" => 0,
"O" => 0,
)
The rules are as follows.
The start of a row starts with a digit (in the above case 3). The second row is a continuation of seats from the first row. If I was to post the full data I have, the third row starts with 4 which means that its not related to the two above.
A flight number always starts with a # and is following by TWO Letters and 1-4 numbers. Sometimes there is spaces between the letters and numbers. These are all the types of flight numbers I have discovered
#AA6160
#AA 57
#BA 207
The second row will only contain a continuation of seats, nothing else. This is what I have come up with so far
while ( $elNum < $elements->length ) {
$flightInfo = $elements->item($elNum)->nodeValue;
if (preg_match('/^\\d/', $flightInfo) === 1) {
if(preg_match('/(\d)+[^#]*?\#(\p{Lu}{2})\s*(\d{1,4})\b\s*([\w. ]+?)(?=\s+\p{Lu}{6})\s([A-Z]{3})([A-Z]{3})(.+)/', $flightInfo, $matches)===1){
$row = $matches[1];
$fltcode = $matches[2].$matches[3];
$ffrom = $matches[5];
$fto = $matches[6];
$other = $matches[7];
$this->flights[$fltcode] = array(
"command" => $terminal_command,
"row" => $row,
"flightNumber" => $fltcode,
"from" => $ffrom,
"to" => $fto,
"other" => $other
);
}
}
++$elNum;
}
The main thing I am struggling with is the seats. I am not sure how to get the ones I need from the first row and combine them with the ones from the second row in the output format I need them all to be.
I am not even sure if regex is the best option here, or if I should explode everything on spaces and sort them like this?
Any advice on the situation is appreciated. Here is some additional data
5*S#DL4386 J9 C9 D9 I9 Z9 W9 Y9 B9 LHRMIA 1235 1705 * 744 0E
M9 S9 H9 Q9 K9 L9 U9 T9 X9 V9
6 #VS 5 J9 C9 D9 I9 Z9 W9 S9 H9 LHRMIA 1235 1705 744 0E
K9 Y9 B9 R9 L9 U9 M9 E9 Q9 X9 N9 O9
7 #IB4637 F9 A9 J9 C9 D9 R9 I. W9 LHRMIA 1415 1825 * 744 0E
Z. Y9 B9 H9 K. M. L. V. S. N. Q. O.
Thanks
Upvotes: 0
Views: 84
Reputation: 89547
An example that uses XMLReader instead of DOMDocument for the xml parsing, because it is faster and use less memory. The patterns are designed to be more readable (with the free-spacing mode and the named captures) and efficient (with anchors, without useless unicode character classes like \p{Lu}
, lookaheads, or unused capturing groups).
$xml = <<<EOD
<?xml version="1.0" encoding="utf-8" ?>
<root xmlns:terminal="http://test.com/terminal">
<terminal:Text>1 #AY5767 F9 A9 P. J9 C9 D9 I9 Y9 LHRMIA 0945 1410 * 777 0E</terminal:Text>
<terminal:Text> B9 H9 K9 M9 L9 V9 S9 N9 Q9 O9 G9 </terminal:Text>
<otherthings>blah blah blah</otherthings>
<terminal:Text>2 #AY5768 F9 A9 P. J9 C9 D9 I9 Y9 ROMMIL 0945 1410 * 777 0E</terminal:Text>
<terminal:Text> B9 H9 K9 M9 L9 V9 S9 N9 Q9 O9 G9 </terminal:Text>
<terminal:Text> E8 G8 R8 S8 T4 U2 </terminal:Text>
</root>
EOD;
$patternFirstLine = <<<EOD
~
\A
[0-9]+ \s+
\# (?<code1> [A-Z]{2} ) \s* (?<code2> [0-9]{1,4} ) \s+
(?<seat1> [A-Z][0-9.] (?: \s+ [A-Z][0-9.] )*+ ) \s+
(?<from> [A-Z]{3} ) (?<to> [A-Z]{3} ) \s+
(?<other> .*\S ) \s*
\z
~x
EOD;
$patternNextLines = <<<EOD
~
\A \s*
(?<seatN> [A-Z][0-9.] (?: \s+ [A-Z][0-9.] )*+ )
\s* \z
~x
EOD;
$parser = new XMLReader();
$parser->xml($xml);
$temp = false;
$results = [];
while($parser->read()) {
while ($parser->name === 'terminal:Text') {
if (preg_match($patternFirstLine, $parser->readInnerXML(), $m)) {
if ($temp) $results[] = $temp;
$temp = [
"flightNumber" => $m['code1'] . $m['code2'],
"from" => $m['from'],
"to" => $m['to'],
"seats" => $m['seat1'],
"other" => $m['other']
];
} elseif ($temp && preg_match($patternNextLines, $parser->readInnerXML(), $m))
$temp['seats'] .= ' ' . $m['seatN'];
else
$temp = false;
$parser->next('Text');
}
}
if ($temp) $results[] = $temp;
$results = array_map(function ($i) {
$seats = explode(' ',$i['seats']);
$i['seats'] = [];
foreach ($seats as $seat)
$i['seats'][$seat[0]] = $seat[1];
return $i;
}, $results);
print_r($results);
Note: in this example, I use XMLReader::xml()
to load the xml content, but one of the main interest of XMLReader
is to use the XMLReader::open
method with the xml uri.
Upvotes: 1
Reputation: 13807
You could try something more simple, but I'm not 100% sure, that this covers everything you need.
function yourFunction($input) {
// get the individual words, meaning split on whitespace
$words = preg_split('/\s+/', $input);
// filter the words the best we can
$valuableInput = array_filter($words, function($word) {
$chars = str_split($word);
return count($chars) == 2 && is_int($chars[1]);
});
// Create the mapping between the characters and the numbers
$output = array();
foreach ($valuableInput as $word) {
$chars = str_split($word);
$output[$chars[0]] = $chars[1];
}
return $output;
}
Upvotes: 1