Reputation: 6408
I'm writing a script to search a logfile for a given python regex pattern. Setting aside the fact that this would be much easier to do using a simple Bash script, can it be done in Python? Here's what I've run into:
/var/log/auth.log
logscour
.logscour
takes only one arg called regex_in
.[root@localhost]: # logscour '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
Should return the lines inside of /var/log/auth.log
that contain an IPv4 address.
I want to find a sort of anti-re.escape()
, as I am in backslash-hell. Here's a snippet:
import re
import argparse
def main(regex_in, logfile='/var/log/auth.log'):
## Herein lies the problem!
# user_regex_string = re.escape(regex_in) #<---DOESN'T WORK, EVEN MORE ESCAPE-SLASHES
# user_regex_string = r'{}'.format(regex_in) #<---DOESN'T WORK
user_regex_string = regex_in #<---DOESN'T WORK EITHER GAHHH
with open(logfile, 'rb+') as authlog:
for aline in authlog:
if re.match(user_regex_string, aline):
print aline
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("regex_in", nargs="?", help="enter a python-compliant regex string. Parentheses & matching groups not supported.", default=None)
args = parser.parse_args()
if not args.regex_in:
raise argparse.ArgumentError('regex_in', message="you must supply a regex string")
main(args.regex_in)
This is giving me back nothing, as one would expect due to the fact that I'm using Python2.7 and these are bytestrings I'm dealing with.
Does anyone know a way to convert 'foo'
to r'foo'
, or an "opposite" for re.escape()
?
Upvotes: 3
Views: 6524
Reputation: 54233
user_regex_string = re.compile(regex_in)
and
re.search(user_regex_string, aline)
should work fine. You need re.search
instead of re.match
because the IP address isn't necessarily at the start of a line.
I always find re.match
very convenient in order to introduce subtle bugs in my code. :)
On my server, logscour '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
outputs:
May 28 17:38:53 dmzXX sshd[1736]: Received disconnect from 123.200.20.158: 11: Bye Bye [preauth]
May 28 17:38:54 dmzXX sshd[1738]: Invalid user guest from 123.200.20.158
...
That being said grep -P 'pattern' file
would also work:
grep -P "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" /var/log/auth.log
-P
stands for:
-P, --perl-regexp Interpret PATTERN as a Perl regular expression (PCRE, see below). This is highly experimental and grep -P may warn of unimplemented features.
-P
is needed in order to interpret \d
as [0-9]
Upvotes: 3