Reputation: 10240
I'm using the following regular expression to try to match all 'hashtagged' words in a given string:
/([^a-zA-Z0-9-_&])#([0-9a-zA-Z_]+)/
In the following string, #rather
, #pointless
and #text
will be successfully matched:
My string: this is some #rather #pointless meaningless #text.
However, in a string where the very first word is hashtagged, only the subsequent hashtagged words (#pointless
and #text
) are matched:
My string: #rather #pointless meaningless #text
How can I ensure the very first word of my string is also matched if it is hashtagged?
EDIT:
I'm using the expression in my PHP script, or more specifically, inside a preg_replace()
function like so:
$content = preg_replace( '/#\w+/g', "$1<a href=\"/tags/$2\" title=\"$2\">#$2</a>", $content );
Upvotes: 0
Views: 31
Reputation: 10432
Does your language/engine support negative lookbehinds?
(?<![\w-&])#(\w+)
Upvotes: 1
Reputation: 7912
What you need is to use the \w character class. Not sure what language you're writing in, but you can do this very simply like this:
/(\w*)#(\w+)/
Edit: Changed the above to make capturing groups fitting with your replacement string.
Upvotes: 3
Reputation: 6598
The first part (between the parenthesis) requires some text to be in front of the hash. You can make it optional if it is at the beginning of the string:
/(^|[^a-zA-Z0-9-_&])#([0-9a-zA-Z_]+)/
As some suggested, you can avoid writing all matching characters explicitly (using groups):
/(^|[^\w-&])#(\w+)/
Upvotes: 1