Reputation: 13
I need to match Twitter-Hashtags within an Android-App, but my code doesn't seem to do what it's supposed to. What I came up with is:
ArrayList<String> tags = new ArrayList<String>(0);
Pattern p = Pattern.compile("\b#[a-z]+", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(tweet); // tweet contains the tweet as a String
while(m.find()){
tags.add(m.group());
}
The variable tweet contains a regular tweet including hashtags - but find() doesn't trigger. So I guess my regular expression is wrong.
Upvotes: 1
Views: 1373
Reputation: 336138
Your regex fails because of the \b
word boundary anchor. This anchor only matches between a non-word character and a word-character (alphanumeric character). So putting it directly in front of the #
causes the regex to fail unless there is an alphanumeric character before the #
! Your regex would match a hashtag in foobarfoo#hashtag blahblahblah
but not in foobarfoo #hashtag blahblahblah
.
Use #\w+
instead, and remember, inside a string, you need to double the backslashes:
Pattern p = Pattern.compile("#\\w+");
Upvotes: 3
Reputation: 10479
Your pattern should be "#(\\w+)" if you are trying to just match the hash tag. Using this and the tweet "retweet pizza to #pizzahut", doing m.group() would give "#pizzahut" and m.group(1) would give "pizzahut".
Edit: Note, the html display is messing with the backslashes for escape, you'll need to have two for the w in your string literal in Java.
Upvotes: 2