Reputation: 310
I'm having trouble eliminating some stray portions of HTML that are ending up in an automatically generated excerpt for a WordPress site. For example, at the beginning of an excerpt I'll see:
href=”https://stackoverflow.com”>Excerpt text starts here...
or at the end of an excerpt:
...excerpt text ends here <a
So it seems what I'm looking for is a method to match and remove any string of non-space characters at the start of the excerpt that end with ">" or any string of non-space characters at the end of the excerpt that start with "<".
Upvotes: 0
Views: 65
Reputation: 147236
If the excerpt doesn't contain <
and >
characters, there are a couple of approaches you can take. One to remove the segments as you describe in your question using preg_replace
, the other to use preg_match
to find a set of characters between a >
and a <
. For example:
$excerpts = array('href=”https://stackoverflow.com”>Excerpt text starts here... ...excerpt text ends here <a',
'href=”https://stackoverflow.com”>Excerpt text starts here... ...excerpt text ends here',
'Excerpt text starts here... ...excerpt text ends here <a',
'Excerpt text starts here... ...excerpt text ends here'
);
foreach ($excerpts as $excerpt) {
preg_match('/(?<=^|>)[^<>]+(?=<|$)/', $excerpt, $matches);
echo $matches[0] . PHP_EOL;
}
foreach ($excerpts as $excerpt) {
echo preg_replace(array('/.*>/', '/<.*$/'), '', $excerpt) . PHP_EOL;
}
Output:
Excerpt text starts here... ...excerpt text ends here
Upvotes: 1