Reputation: 5845
Need to extract out hashtag Strings from a source String in Java. Any ideas / examples?
Thanks, Sri
Upvotes: 5
Views: 5490
Reputation: 597432
Here is what I'm using (it handles UTF-8 tags as well, not only ASCII):
private static final Pattern TAG_PATTERN =
Pattern.compile("(?:^|\\s|[\\p{Punct}&&[^/]])(#[\\p{L}0-9-_]+)");
Btw, you should be able to get the hashtags from the tweet entities (include_entities=true
)
Upvotes: 9