nickfindley
nickfindley

Reputation: 310

preg_match to remove stray HTML

I'm having trouble eliminating some stray portions of HTML that are ending up in an automatically generated excerpt for a WordPress site. For example, at the beginning of an excerpt I'll see:

href=”https://stackoverflow.com”>Excerpt text starts here...

or at the end of an excerpt:

...excerpt text ends here <a

So it seems what I'm looking for is a method to match and remove any string of non-space characters at the start of the excerpt that end with ">" or any string of non-space characters at the end of the excerpt that start with "<".

Upvotes: 0

Views: 65

Answers (1)

Nick
Nick

Reputation: 147236

If the excerpt doesn't contain < and > characters, there are a couple of approaches you can take. One to remove the segments as you describe in your question using preg_replace, the other to use preg_match to find a set of characters between a > and a <. For example:

$excerpts = array('href=”https://stackoverflow.com”>Excerpt text starts here... ...excerpt text ends here <a',
    'href=”https://stackoverflow.com”>Excerpt text starts here... ...excerpt text ends here',
    'Excerpt text starts here... ...excerpt text ends here <a',
    'Excerpt text starts here... ...excerpt text ends here'
);

foreach ($excerpts as $excerpt) {
    preg_match('/(?<=^|>)[^<>]+(?=<|$)/', $excerpt, $matches);
    echo $matches[0] . PHP_EOL;
}

foreach ($excerpts as $excerpt) {
    echo preg_replace(array('/.*>/', '/<.*$/'), '', $excerpt) . PHP_EOL;
}

Output:

Excerpt text starts here... ...excerpt text ends here 

Demo on 3v4l.org

Upvotes: 1

Related Questions