Reputation: 267077
I need to parse blocks of text which are in a format something like this:
Today the weather is excellent bla bla bla.
<temperature>35</temperature>.
I'm in a great mood today.
<item>Desk</item>
I want to parse text like this, and translate it into an array which resembles something like this:
$array[0]['text'] = 'Today the weather is excellent bla bla bla. ';
$array[0]['type'] = 'normalText';
$array[1]['text'] = '35';
$array[1]['type'] = 'temperature';
$array[2]['text'] = ". I'm in a great mood today.";
$array[2]['type'] = 'normalText';
$array[3]['text'] = 'Desk';
$array[3]['type'] = 'item';
Essentially, I want the array to contain all of the text in the same order as in the original text, but split into types: Normal text (meaning stuff which wasn't between any tags), and other types like temperature, item, which were determined by the tags the text was between.
Is there a way to do this (i.e seperate the text into normal text, and other types, using regular expressions) or should I behind the scenes convert the text into properly structured text, like:
<normal>Today the weather is excellent bla bla bla.</normal>
<temperature>35</temperature>.
<normal> I'm in a great mood today.</normal><item>Desk</item>
Before it tries to parse the text?
Upvotes: 1
Views: 437
Reputation: 5072
EDIT: Now it works exactly as expected!
Solution:
<?php
$code = <<<'CODE'
Today the weather is excellent bla bla bla.
<temperature>35</temperature>.
I'm in a great mood today.
<item>Desk</item>
CODE;
$result = array_filter(
array_map(
function ($element) {
if (!empty($element)) {
if (preg_match('/^\<([^\>]+)\>([^\<]+)\</', $element, $matches)) {
return array('text' => $matches[2],
'type' => $matches[1]);
} else {
return array('text' => $element,
'type' => 'normal');
}
}
return false;
},
preg_split('/(\<[^\>]+\>[^\<]+\<\/[^\>]+\>)/', $code, null, PREG_SPLIT_DELIM_CAPTURE)
)
);
print_r($result);
Output:
Array
(
[0] => Array
(
[text] => Today the weather is excellent bla bla bla.
[type] => normal
)
[1] => Array
(
[text] => 35
[type] => temperature
)
[2] => Array
(
[text] => .
I'm in a great mood today.
[type] => normal
)
[3] => Array
(
[text] => Desk
[type] => item
)
)
Upvotes: 3
Reputation: 122
Try reading through the text, line by line. You have 2 cases. Adding normal text and adding text that has a special tag. While adding the normal text to a variable, look for a tag with regexp.
preg_match("/\<(\w)\>/", $line_from_text, $matches)
matches the tag, the ()'s saves the word to use with your array in $matches. Now just add text to a variable until you meet the end tag. Hope this helps.
Upvotes: 1