Reputation: 1271
My regex to find a hashtag is:
String REG_EX_TAG = "[#]{1}+[A-Za-z0-9-_]+\b";
Pattern tagMatcher = Pattern.compile(REG_EX_TAG);
but if I insert the string today it's a beautiful sunny day #sun. Hello my name is Mat #Sweet#Home
the result is:
the tag #Sweet,#Home,#sun
I would rather that the result was only the tag #Sweet and #sun
How can I change my regex?
Upvotes: 0
Views: 4245
Reputation: 3553
Perhaps this could help:
".*?\\s(#\\w+).*?"
Implemented in your program as follows:
String YourString = "Today is a beautiful sunny day #sun. Hello my name is Mat #Sweet#Home";
String REG_EX_TAG = ".*?\\s(#\\w+).*?";
Pattern tagMatcher = Pattern.compile(REG_EX_TAG);
Matcher m = tagMatcher.matcher(YourString);
if(m.find())
{
String tag = m.group(1);
// Whatever you want to do with the tag - store it, print it, etc.
}
m.group(1)
contains the tag (because in the regex, it is enclosed within parentheses)
^
symbolizes the very start of the String, so that the tag matched is the very first one.
.*?
is a lazy match for any sequence of characters (the non-hashtag part) i.e. words, digits, spaces etc.
\\s
tells the regex to match the tag with a space before it (As far as I can see, this is the condition set by the OP)
(#\\w+)
is the actual tag, indicated by a #
and one-or-more Word Characters i.e. letters, digits, underscores, or a combination of them.
Finally, .*?
indicates that there may be some more text after the hashtag.
Note - This regex will match the typical conventions of a hashtag i.e. #Blessed
or #9_11
or #I_Need_MoreUpvotes
, without any special characters, and preceded by a space.
EDIT - To match all tags, just replace the if(m.find())
with while(m.find())
Upvotes: 1
Reputation: 626826
The "\b"
matches a backspace char, not a word boundary. You need to double escape it.
Also, the pattern only seems to match any hashtag anywhere in a string. You need to get the first one if there is a chain of hashtags.
You may use
(#[A-Za-z0-9-_]+)(?:#[A-Za-z0-9-_]+)*
See the regex demo.
Details
(#[A-Za-z0-9-_]+)
- Group 1 capturing the first occurrence of #
followed with 1+ letters, digits, -
or _
(?:#[A-Za-z0-9-_]+)*
- matches 0+ repetitions of the hashtag pattern.Grab Group 1 values only.
See the Java demo:
String s = "today it's a beautiful sunny day #sun. Hello my name is Mat #Sweet#Home";
Pattern pattern = Pattern.compile("(#[A-Za-z0-9-_]+)(?:#[A-Za-z0-9-_]+)*\\b");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
// => [#sun, #Sweet]
Note that {1}+
is redundant, it matches 1 occurrence of the quantified subpattern (and that is a default action).
Upvotes: 2