Reputation: 117
i try to make some script with python to find and count a word in string. A word is "@sosiora", i have find some example but it's not find "@sosiora" but "sosiora". Here is my script
#!/usr/bin/python
import re
words = ["@sosiora"]
exactMatch = re.compile(r'\b%s\b' % '\\b|\\b'.join(words), flags=re.IGNORECASE)
print len(exactMatch.findall("@riky ini adalah @sosiora dengan huruf s "))
I don't know but it's always print 0. Please help me, i'm newbie in Python. Thank you
Extra : I had edit my code now, but i found some problem again. how to extract the word if i have found them ? here is my code now
#!/usr/bin/python
import re
words = ["@sosiora","@sosiora#1","@sosiora#2","@sosiora#3","@sosiora#4","@sosiora#5"]
exactMatch = re.compile('|'.join(words), flags=re.IGNORECASE)
print len(exactMatch.findall("@riky ini adalah @Sosiora#1 dengan huruf s "))
if i found "@sosiora#1" or "@sosiora#2", how to extract the number? because i need that number.
Upvotes: 1
Views: 676
Reputation: 4958
The regex you're compiling is wrong... This should work better:
#!/usr/bin/python
import re
words = ["(@sosiora#(\d+))"]
exactMatch = re.compile('|'.join(words), flags=re.IGNORECASE)
text = "@riky ini adalah @Sosiora#1 dengan huruf s"
m = exactMatch.findall(text)
print 'Found %d matches' % len(m)
print 'First word found: ' + m[0][0] # @Sosiora#1
print 'First index found: ' + m[0][1] # 1
Upvotes: 3
Reputation: 22457
It has nothing to do with Python; your regex itself is wrong.
The GREP code \b
matches a word boundary – that is, it will match if on one side there is a "word character" and on the other side there is none. The character @
is not a word character (it does not get matched with \w
) and so your regular expression is expecting something like abc@sosiora
(with a word character left of the @
).
Fix it by removing the left \b
from your regular expression.
Upvotes: 1