Pez Cuckow
Pez Cuckow

Reputation: 14412

Regex find the first word

I'm trying to use regex to add a span to the first word of content for a page, however the content contains HTML so I am trying to ensure just a word gets chosen. The content changes for every page.

Current script is:

preg_match('/(<(.*?)>)*/i',$page_content,$matches);
$stripped = substr($page_content,strlen($matches[0]));
preg_match('/\b[a-z]* \b/i',$stripped,$strippedmatch);
echo substr($page_content, 0, strlen($matches[0])).'<span class="h1">'.$strippedmatch[0].'</span>'.substr($stripped, strlen($strippedmatch[0]));

However if the $page_content is <p><span class="title">This is </span> my title!</p> Then my regex thinks the first word is "span" and adds the tags around that.

Is there any way to fix this? (or a better way to do it).

Upvotes: 0

Views: 2276

Answers (3)

polygenelubricants
polygenelubricants

Reputation: 383736

You shouldn't be using regex for this, but if you insist, you can try something like this:

<?php

$texts = array(
  '<p><span class="title">This is </span> my title!</p>',
  '<1>   <2>   <3>   blah   blah   <4> <5> blah',
  'garbage <1> <2> real stuff begins <3> <4>',
);

foreach ($texts as $text) {
  print preg_replace('/(>\s*)(\w+)/', '\1{{\2}}', $text, 1)."\n";
}

?>

This prints:

<p><span class="title">{{This}} is </span> my title!</p>
<1>   <2>   <3>   {{blah}}   blah   <4> <5> blah
garbage <1> <2> {{real}} stuff begins <3> <4>

Upvotes: 0

Tor Valamo
Tor Valamo

Reputation: 33749

This seems to work...

(?<=\>)\b\w*\b|^\w*\b

If you wanna allow spaces in front also (remember to trim the resulting string):

(?<=>)\s*\b\w*\b|^\s*\w*\b

Upvotes: 1

Spidfire
Spidfire

Reputation: 5523

If i understand you correct you want a tag around the first word (none tag) with regex you could get that by using this regex

$code = preg_replace('/^(<.+?>\s*)+?(\w+)/i', '\1<span class="h1">\2</span>', $code);

this one just loops over the tags and waits until it finds text outside the tags

Upvotes: 0

Related Questions