Reputation: 203
So I am trying to incorporate a hashtag search into posts on my application. I have come up with a few close Regular Expressions, but none seem to work. Let's say I have a string:
#i keep #hashtags in my mom's attic. She says I am her number #1 son. Why I'm not num#ber #2: http://pics.com/pic#thepic
I would want the RegEx to match:
I want it to specifically NOT match HTML entities, #anchors in URLs, hashtags that start in the middle of a word, and any trailing punctuation after a hashtag. I also want to make sure it matches a hashtag at the beginning of the string, as well as hashtags that are wholly numeric. I want it to allow hyphens and underscores in hashtags as well. This is for a MySQL query, so Lookaheads and Lookbacks won't work.
The current one I'm stuck on is:
#([A-Za-z0-9_]+)
But it's not really doing the job. RegEx is an area in which my knowledge is a bit lacking. Thanks for any help!
Upvotes: 3
Views: 3951
Reputation: 562428
The following matches all the examples you want matched, and rejects all the ones you don't want matched:
WHERE mycolumn REGEXP '^#[[:alnum:]]' OR mycolumn REGEXP ' #[[:alnum:]]'
You don't have to match the whole hashtag, since SQL won't return that string anyway, it just returns 1 (match) or 0 (no match). To extract the hashtag string, you'll have to use some other string manipulation in you application programming language.
Upvotes: 2
Reputation: 47084
I think you're stuck with testing it in 3 rounds:
`txt` REGEXP '^#[0-9a-zA-Z]+[: ,]';
OR `txt` REGEXP '[: ,]#[0-9a-zA-Z]+[: ,]';
OR `txt` REGEXP '[: ,]#[0-9a-zA-Z]+$';
or you could of course do
CONCAT(" ",`txt`," ") REGEXP '[: ,]#[0-9a-zA-Z]+[: ,]';
either way, it will get messy. Of course the [: ,]
may be expanded to what you think is not part of things like '
...
Upvotes: 0