Reputation: 1083
I have an assignment and I was wondering if you could help. For part of the question I am required to analyse a system log. The log contains information such as time and date, if root access was attempted for and from what ip address the attempt came from.
My question is: how do I loop through the log and pull out the ip addresses.
myFile = open('syslog','r')
for line in myFile.readlines():
list_of_line = line.split(' ')
So here I've split the list up but how can I loop through trying to locate an ip address. Previously I have used locations but this isn't practical as it only looks for one address. I want it to search through and find all addresses so would that mean looking for strings with a certain length e.g. xxx.xxx.xx.xx as the ip address and specify that I am looking for numeric values.
edit-
Jan 10 09:32:07 j4-be03 sshd[3876]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.241.173.35 user=root
Jan 10 09:32:09 j4-be03 sshd[3876]: Failed password for root from 218.241.173.35 port 47084 ssh2
Jan 10 09:32:17 j4-be03 sshd[3879]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.241.173.35 user=root
Jan 10 09:32:19 j4-be03 sshd[3879]: Failed password for root from 218.241.173.35 port 47901 ssh2
Jan 10 09:32:26 j4-be03 sshd[3881]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.241.173.35 user=root
Jan 10 09:32:29 j4-be03 sshd[3881]: Failed password for root from 218.241.173.35 port 48652 ssh2
I've been told to ignore the line containing pam_unix
and focus on the line containing "Failed password for root" as they are duplicate entries. About to try the regular expression one now although I really don't understand what is going on.
Upvotes: 0
Views: 2892
Reputation: 414315
if s in line
prefix = "Failed password for root from"
def extract_ip(line):
# get string between `prefix` and 'port'
return line.partition(prefix)[2].partition('port')[0].strip()
with open('syslog') as f:
ips = [extract_ip(line) for line in f if prefix in line]
In general it is a job for a regex to tokenize input.
Upvotes: 1
Reputation: 727
import re
myFile = open('syslog','r')
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', myFile.read() )
print ip
Don't you just love python?
Upvotes: 0
Reputation: 53311
One solution (albeit a little unwieldy) is to split each string in list_of_line
using period as your delimiter. Once you've done that you can check to see if the array you've generated is 4 items long, which would indicate that it is an IP, whereupon you can grab the original string from list_of_line
and do whatever you need to with it. Do you want some psuedocode?
Note: While this approach is simple and readable, it does have a few drawbacks. First, it's probably somewhat slow, although if this is an assignment, speed most likely isn't really a problem. Secondly, you may have other items in list_of_line
which have the same format as an IP (I suppose that's pretty unlikely, though), in which case you'd get non-IP results in your list of IPs. Just a few things to be aware of.
The other solution would be to use Python's Regex function -- you can just Google that for more info, it's a bit complicated.
Upvotes: 0