Reputation: 4287
I want to extract the Chinese Weibo username. So I use this code:
def atExtractor(sentence):
return re.findall("@.*\\s", sentence, re.I)
And then I extract this sentence:
atExtractor(u"@中国联通网上营业厅 @北京地铁 北京地铁10号线,从惠新西街南口到海淀黄庄")
It get:
[u'@中国联通网上营业厅 @北京地铁 ']
Why the regex only get one match but not two? And the same problem happens when I want to extract hashtag:
def activityExtractor(sentence):
return re.findall("#.*#", sentence, re.I)
activityExtractor(u"#中国联通网上营业厅# #北京地铁# 北京地铁10号线")
It get:
[u'#中国联通网上营业厅# #北京地铁# ']
Upvotes: 0
Views: 91
Reputation: 174706
Because your pattern is greedy.
re.findall("@.*?(?=\\s)", sentence, re.I)
or
re.findall(r"@\S*", sentence, re.I)
\S*
should match zero or more non-space characters.
Upvotes: 6