Reputation: 35
I wrote this code to search for an exact word in a text (%PDF-1.1)
import re
x = "%PDF-1.1 pdf file contains four parts one of them the header part which looks like "
s = re.compile("%PDF-\d\.\d[\b\s]")
match = re.search("%PDF-\d\.\d[\b\s]",x)
if match:
print match.group()
else:
print "its not found"
but the problem is if I have "s%PDF-1.1" it returns the result %PDF-1.1 but it is wrong and when x = "pdf file contains four parts one of them the header part which looks like %PDF-1.1" it gives me nothing
how could I search the exact word ????
Upvotes: 0
Views: 867
Reputation: 2323
At the moment, you are searching for the word "%PDF-X-X" (Where X is a number) followed by something more without caring about what come before it. If you want to search this word only at the beginning, end of the string or if it is a word (I assume with a space before and after it) you can try this:
import re
x = "%PDF-1.1 pdf file contains four parts one of them the header part which looks like "
y = "pdf file contains four parts one of them the header part which looks like %PDF-1.1"
s = re.compile("(^|\s)(?P<myword>%PDF-\d\.\d)($|\s)")
match = s.search(x)
if match:
print match.group("myword")
else:
print "its not found"
match = s.search(y)
if match:
print match.group("myword")
else:
print "its not found"
# %PDF-1.1
# %PDF-1.1
If you want that the word is also found if it is followed by a symbol, you can make something like this, that allow that it is followed by anything that is not a letter or a number:
s = re.compile("(^|\s)(?P<myword>%PDF-\d\.\d)($|\s|[^a-zA-Z0-9])")
Upvotes: 1