Hubert Siwkin
Hubert Siwkin

Reputation: 385

How to write a hashtag matching regex

I have a problem with writing an regex (in Ruby, but I don't think that it changes anything) that selects all proper hashtags.

I used ( /(^|\s)(#+)(\w+)(\s|$)/ ), which doesn't work and I have no idea why.

In this example:

#start #middle #middle2 #middle3 bad#example #another#bad#example #end

it should mark #start, #middle, #middle2, #middle3 and #end.

Why doesn't my code work and how should a proper regex look?

Upvotes: 0

Views: 249

Answers (4)

Appak
Appak

Reputation: 492

As for why the original does not work lets look at each bit

  1. (^|\s) Start of line or white space
  2. (#+) one or more #
  3. (\w+) one or more alphanumeric characters
  4. (\s|$) white space or end of line

The main problem is a conflict between 1 and 4. When 1 matches white space that white space was already matched in the last group as part 4. So 1 does not exist and the match moves to the next possible

4 is not really needed since 3 will not match white space.

So here is the result

(?:^|\s)#(\w+)

https://regex101.com/r/iU4dZ3/3

Upvotes: 4

sinisake
sinisake

Reputation: 11338

One more regex:

\B#\w+\b

This one doesn't capture whitespaces...

https://regex101.com/r/iU4dZ3/4

Upvotes: 0

kaho
kaho

Reputation: 4784

does [^#\w](#[\w]*)|^(#[\w]*) works?

getting an # not following a character, and capturing everything until not a word.

the or case handle the case where the first char is #.

Live demo: http://regexr.com/3al01

Upvotes: 1

chris85
chris85

Reputation: 23880

How's this work for you?

(#[^\s+]+)

This says find a hash tag then everything until a whitespaces.

Upvotes: 0

Related Questions