Nilani Algiriyage
Nilani Algiriyage

Reputation: 35696

Avoid hex-encoding in a text-file for a searching program in python

I have written a python program to analyze a server log(a text file) and to find non matching strings to a user input. Anyway hex-encoded strings are not considered by the program. Ex : In the following case the program says that there are no non-matching values to the user input although 'www.peoplesmonton.com' is available. Please help me to avoid this?

for line in lines:
    match = re.search('\\b' + userinput + '\\b',line)

sample text file:

https://www.mysite.com/myworks/accaply/inquiry.asp 
http://www.peoplesmonton.com/amb/cgi-bin/bank/bank/ambt%20Bank%20Of%20Frnak%20PLC_asp.htm 
http://www.peoplesmonton.com/comblk/cgi-bin/bank/bank/ambt%20Bank%20Of%20ambt%20PLC_asp.htm 

Upvotes: 0

Views: 61

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122492

The information is URL encoded, so use urllib2.unquote to decode that.

>>> input = '''\
... https://www.mysite.com/myworks/accaply/inquiry.asp 
... http://www.peoplesmonton.com/amb/cgi-bin/bank/bank/ambt%20Bank%20Of%20Frnak%20PLC_asp.htm 
... http://www.peoplesmonton.com/comblk/cgi-bin/bank/bank/ambt%20Bank%20Of%20ambt%20PLC_asp.htm 
... '''
>>> import urllib2
>>> print urllib2.unquote(input)
https://www.mysite.com/myworks/accaply/inquiry.asp 
http://www.peoplesmonton.com/amb/cgi-bin/bank/bank/ambt Bank Of Frnak PLC_asp.htm 
http://www.peoplesmonton.com/comblk/cgi-bin/bank/bank/ambt Bank Of ambt PLC_asp.htm 

Upvotes: 2

Related Questions