Reputation: 171
I have a string with custom markup for saving songs with chords, tabulatures, notes etc. It contains
things in various brackets: \[.+?\]
, \[[.+?\]]
, \(.+?\)
arrows: <-{3,}>
, \-{3,}>
, <\-{3,}
and so on...
Sample text might be
Text Text [something]
--->
Text (something 021213)
Now I wish to parse the markup into array of tokens, objects of corresponding classes, which would look like (matched parts in brackets)
ParsedBlock_Text ("Text Text ")
ParsedBlock_Chord ("something")
ParsedBlock_Text (" ")
ParsedBlock_NewColumn
ParsedBlock_Text (" text ")
ParsedBlock_ChordDiagram ("something 021213")
I know how to match them, but either I must match each different pattern, and save offsets to properly sort the array, or I match them at once and I don't know which one has been matched.
Thanks, MK
Upvotes: 2
Views: 116
Reputation: 89171
Assuming you do not try to nest these structures, this will tokenize your text:
function ParseText($text) {
$re = '/\[\[(?P<DoubleBracket>.*?)]]|\[(?P<Bracket>.*?)]|\((?P<Paren>.*?)\)|(?<Arrow><---+>?|---+>)/s';
$keys = array('DoubleBracket', 'Bracket', 'Paren', 'Arrow');
$result = array();
$lastStart = 0;
if (preg_match_all($re, $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)) {
foreach ($matches as $match) {
$start = $match[0][1];
$prefix = substr($text, $lastStart, $start - $lastStart);
$lastStart = $start + strlen($match[0][0]);
if ($prefix != '' && !ctype_space($prefix)) {
$result []= array('Text', trim($prefix));
}
foreach ($keys as $key) {
if (isset($match[$key]) && $match[$key][1] >= 0) {
$result []= array($key, $match[$key][0]);
break;
}
}
}
}
$prefix = substr($text, $lastStart);
if ($prefix != '' && !ctype_space($prefix)) {
$result []= array('Text', trim($prefix));
}
return $result;
}
Example:
$mytext = <<<'EOT'
Text Text [something]
--->
Text (something 021213)
More Text
EOT;
$parsed = ParseText($mytext);
foreach ($parsed as $item) {
print_r($item);
}
Output:
Array
(
[0] => Text
[1] => Text Text
)
Array
(
[0] => Bracket
[1] => something
)
Array
(
[0] => Arrow
[1] => --->
)
Array
(
[0] => Text
[1] => Text
)
Array
(
[0] => Paren
[1] => something 021213
)
Array
(
[0] => Text
[1] => More Text
)
If you want to add more patterns to the regex, make sure you put longer patterns at the start, so they are not mistakenly matched as the wrong type.
Upvotes: 1