Matteo
Matteo

Reputation: 1271

Regular expression Hashtag

My regex to find a hashtag is:

String REG_EX_TAG = "[#]{1}+[A-Za-z0-9-_]+\b";
Pattern tagMatcher = Pattern.compile(REG_EX_TAG);

but if I insert the string today it's a beautiful sunny day #sun. Hello my name is Mat #Sweet#Home the result is: the tag #Sweet,#Home,#sun

I would rather that the result was only the tag #Sweet and #sun

How can I change my regex?

Upvotes: 0

Views: 4245

Answers (2)

Robo Mop
Robo Mop

Reputation: 3553

Perhaps this could help:

".*?\\s(#\\w+).*?"

Implemented in your program as follows:

String YourString = "Today is a beautiful sunny day #sun. Hello my name is Mat #Sweet#Home";

String REG_EX_TAG = ".*?\\s(#\\w+).*?";

Pattern tagMatcher = Pattern.compile(REG_EX_TAG);
Matcher m = tagMatcher.matcher(YourString);
if(m.find())
{
    String tag = m.group(1);
    // Whatever you want to do with the tag - store it, print it, etc.
}

m.group(1) contains the tag (because in the regex, it is enclosed within parentheses)

Regex -

^ symbolizes the very start of the String, so that the tag matched is the very first one.

.*? is a lazy match for any sequence of characters (the non-hashtag part) i.e. words, digits, spaces etc.

\\s tells the regex to match the tag with a space before it (As far as I can see, this is the condition set by the OP)

(#\\w+) is the actual tag, indicated by a # and one-or-more Word Characters i.e. letters, digits, underscores, or a combination of them.

Finally, .*? indicates that there may be some more text after the hashtag.

Note - This regex will match the typical conventions of a hashtag i.e. #Blessed or #9_11 or #I_Need_MoreUpvotes, without any special characters, and preceded by a space.

EDIT - To match all tags, just replace the if(m.find()) with while(m.find())

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626826

The "\b" matches a backspace char, not a word boundary. You need to double escape it.

Also, the pattern only seems to match any hashtag anywhere in a string. You need to get the first one if there is a chain of hashtags.

You may use

(#[A-Za-z0-9-_]+)(?:#[A-Za-z0-9-_]+)*

See the regex demo.

Details

  • (#[A-Za-z0-9-_]+) - Group 1 capturing the first occurrence of # followed with 1+ letters, digits, - or _
  • (?:#[A-Za-z0-9-_]+)* - matches 0+ repetitions of the hashtag pattern.

Grab Group 1 values only.

See the Java demo:

String s = "today it's a beautiful sunny day #sun. Hello my name is Mat #Sweet#Home";
Pattern pattern = Pattern.compile("(#[A-Za-z0-9-_]+)(?:#[A-Za-z0-9-_]+)*\\b");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println(matcher.group(1)); 
} 
// => [#sun, #Sweet]

Note that {1}+ is redundant, it matches 1 occurrence of the quantified subpattern (and that is a default action).

Upvotes: 2

Related Questions