Reputation: 35696
I have written a python program to analyze a server log(a text file) and to find non matching strings to a user input. Anyway hex-encoded strings are not considered by the program. Ex : In the following case the program says that there are no non-matching values to the user input although 'www.peoplesmonton.com' is available. Please help me to avoid this?
for line in lines:
match = re.search('\\b' + userinput + '\\b',line)
sample text file:
https://www.mysite.com/myworks/accaply/inquiry.asp
http://www.peoplesmonton.com/amb/cgi-bin/bank/bank/ambt%20Bank%20Of%20Frnak%20PLC_asp.htm
http://www.peoplesmonton.com/comblk/cgi-bin/bank/bank/ambt%20Bank%20Of%20ambt%20PLC_asp.htm
Upvotes: 0
Views: 61
Reputation: 1122492
The information is URL encoded, so use urllib2.unquote
to decode that.
>>> input = '''\
... https://www.mysite.com/myworks/accaply/inquiry.asp
... http://www.peoplesmonton.com/amb/cgi-bin/bank/bank/ambt%20Bank%20Of%20Frnak%20PLC_asp.htm
... http://www.peoplesmonton.com/comblk/cgi-bin/bank/bank/ambt%20Bank%20Of%20ambt%20PLC_asp.htm
... '''
>>> import urllib2
>>> print urllib2.unquote(input)
https://www.mysite.com/myworks/accaply/inquiry.asp
http://www.peoplesmonton.com/amb/cgi-bin/bank/bank/ambt Bank Of Frnak PLC_asp.htm
http://www.peoplesmonton.com/comblk/cgi-bin/bank/bank/ambt Bank Of ambt PLC_asp.htm
Upvotes: 2