henrywright
henrywright

Reputation: 10240

Regular expression to match 'hashtagged' words not matching the first one in string

I'm using the following regular expression to try to match all 'hashtagged' words in a given string:

/([^a-zA-Z0-9-_&])#([0-9a-zA-Z_]+)/

In the following string, #rather, #pointless and #text will be successfully matched:

My string: this is some #rather #pointless meaningless #text.

However, in a string where the very first word is hashtagged, only the subsequent hashtagged words (#pointless and #text) are matched:

My string: #rather #pointless meaningless #text

How can I ensure the very first word of my string is also matched if it is hashtagged?

EDIT:

I'm using the expression in my PHP script, or more specifically, inside a preg_replace() function like so:

$content = preg_replace( '/#\w+/g', "$1<a href=\"/tags/$2\" title=\"$2\">#$2</a>", $content );

Upvotes: 0

Views: 31

Answers (3)

Ilya Kozhevnikov
Ilya Kozhevnikov

Reputation: 10432

Does your language/engine support negative lookbehinds?

(?<![\w-&])#(\w+)

http://www.regexr.com/39alk

Upvotes: 1

Gaute L&#248;ken
Gaute L&#248;ken

Reputation: 7912

What you need is to use the \w character class. Not sure what language you're writing in, but you can do this very simply like this:

/(\w*)#(\w+)/

Edit: Changed the above to make capturing groups fitting with your replacement string.

Upvotes: 3

pqnet
pqnet

Reputation: 6598

The first part (between the parenthesis) requires some text to be in front of the hash. You can make it optional if it is at the beginning of the string:

/(^|[^a-zA-Z0-9-_&])#([0-9a-zA-Z_]+)/

As some suggested, you can avoid writing all matching characters explicitly (using groups):

/(^|[^\w-&])#(\w+)/

Upvotes: 1

Related Questions