bigl
bigl

Reputation: 1083

Python - splitting a log and searching for something specific

I have an assignment and I was wondering if you could help. For part of the question I am required to analyse a system log. The log contains information such as time and date, if root access was attempted for and from what ip address the attempt came from.

My question is: how do I loop through the log and pull out the ip addresses.

myFile = open('syslog','r') 
for line in myFile.readlines():
list_of_line = line.split(' ')

So here I've split the list up but how can I loop through trying to locate an ip address. Previously I have used locations but this isn't practical as it only looks for one address. I want it to search through and find all addresses so would that mean looking for strings with a certain length e.g. xxx.xxx.xx.xx as the ip address and specify that I am looking for numeric values.

edit-

Jan 10 09:32:07 j4-be03 sshd[3876]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.241.173.35  user=root
Jan 10 09:32:09 j4-be03 sshd[3876]: Failed password for root from 218.241.173.35 port 47084 ssh2
Jan 10 09:32:17 j4-be03 sshd[3879]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.241.173.35  user=root
Jan 10 09:32:19 j4-be03 sshd[3879]: Failed password for root from 218.241.173.35 port 47901 ssh2   
Jan 10 09:32:26 j4-be03 sshd[3881]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.241.173.35  user=root
Jan 10 09:32:29 j4-be03 sshd[3881]: Failed password for root from 218.241.173.35 port 48652 ssh2

I've been told to ignore the line containing pam_unix and focus on the line containing "Failed password for root" as they are duplicate entries. About to try the regular expression one now although I really don't understand what is going on.

Upvotes: 0

Views: 2892

Answers (3)

jfs
jfs

Reputation: 414315

  • to preselect lines that contain a certain string you could use if s in line
  • to extract ip you could exploit the fact that you know strings that go before/after the ip

Example

prefix = "Failed password for root from"
def extract_ip(line):
    # get string between `prefix` and 'port'
    return line.partition(prefix)[2].partition('port')[0].strip()

with open('syslog') as f:
     ips = [extract_ip(line) for line in f if prefix in line]

In general it is a job for a regex to tokenize input.

Upvotes: 1

user850498
user850498

Reputation: 727

import re
myFile = open('syslog','r') 
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', myFile.read() )
print ip

Don't you just love python?

Upvotes: 0

Elliot Bonneville
Elliot Bonneville

Reputation: 53311

One solution (albeit a little unwieldy) is to split each string in list_of_line using period as your delimiter. Once you've done that you can check to see if the array you've generated is 4 items long, which would indicate that it is an IP, whereupon you can grab the original string from list_of_line and do whatever you need to with it. Do you want some psuedocode?

Note: While this approach is simple and readable, it does have a few drawbacks. First, it's probably somewhat slow, although if this is an assignment, speed most likely isn't really a problem. Secondly, you may have other items in list_of_line which have the same format as an IP (I suppose that's pretty unlikely, though), in which case you'd get non-IP results in your list of IPs. Just a few things to be aware of.

The other solution would be to use Python's Regex function -- you can just Google that for more info, it's a bit complicated.

Upvotes: 0

Related Questions