Reputation:
I'm using preg_match function in PHP in order to extract some values from a RSS Feed. Inside this feed content there is something like this:
<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>
I need to get those "A text with non alphanumeric characters" and "more text with non alphanumeric characters" to save them in a database. I don't know if using regular expressions is the best way to do it.
Thank you so much.
Upvotes: 1
Views: 83
Reputation: 1824
$str = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';
$str = preg_replace('~^.*?</strong>~', '', $str); // Remove leading markup
$str = preg_replace('~</li>$~', '', $str); // Remove trailing markup
$str = preg_replace('~\([^)]++\)~', '', $str); // Remove text within parentheses
$str = trim($str); // Clean up whitespace
$arr = preg_split('~\s*,\s*~', $str); // Split on the comma
Upvotes: 0
Reputation: 14071
Given that the structure is always the same you can use this regex
</strong>([^,]*),([^<]*)</li>
group 1 will have the first fragment, group 2 the other
Once you start parsing html/xml with regexes it becomes quickly apparent that a full blown parser is better suited. For small or throwaway solution you a regex can be useful.
Upvotes: 0
Reputation: 3305
If you want to use regex (i.e. quick and dirty, not really too maintainable), this will give you the text:
$input = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';
// Match between tags
preg_match("#</strong>(.*?)</li>#", $input, $matches);
// Remove the text inside brackets
echo trim(preg_replace("#\s*\(.*?\)\s*#", '', $matches[1]));
Though, nested brackets may fail.
Upvotes: 1