hidemyname
hidemyname

Reputation: 4287

Why does this regex match only once?

I want to extract the Chinese Weibo username. So I use this code:

def atExtractor(sentence):
    return re.findall("@.*\\s", sentence, re.I)

And then I extract this sentence:

atExtractor(u"@中国联通网上营业厅 @北京地铁 北京地铁10号线,从惠新西街南口到海淀黄庄")

It get:

[u'@中国联通网上营业厅 @北京地铁 ']

Why the regex only get one match but not two? And the same problem happens when I want to extract hashtag:

 def activityExtractor(sentence):
        return re.findall("#.*#", sentence, re.I)
 activityExtractor(u"#中国联通网上营业厅# #北京地铁# 北京地铁10号线")

It get:

[u'#中国联通网上营业厅# #北京地铁# ']

Upvotes: 0

Views: 91

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174706

Because your pattern is greedy.

re.findall("@.*?(?=\\s)", sentence, re.I)

or

re.findall(r"@\S*", sentence, re.I)

\S* should match zero or more non-space characters.

Upvotes: 6

Related Questions