Santhosh
Santhosh

Reputation: 901

How to extract a word from text in Python

I have this string "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate." in a log file. What I need to do is look for this message and extract the IP address (1.2.3.4) from the log file.

import os
import shutil
import optparse
import sys

def main():
    file = open("messages", "r")
    log_data = file.read()
    file.close()

    search_str = "is currently trusted in the white list, but it is now using a new trusted certificate."

    index = log_data.find(search_str)
    print index

    return

if __name__ == '__main__':
    main()

How do I extract the IP address? Your response is appreciated.

Upvotes: 2

Views: 363

Answers (4)

Hernan
Hernan

Reputation: 6063

Regular expression is the way to go. But if you fill uncomfortably writing them, you can try a small parser that I wrote (https://github.com/hgrecco/stringparser). It translates a string format to a regular expression. In your case, you will do the following:

from stringparser import Parser

parser = Parser("IP {} is currently trusted in the white list, but it is now using a new trusted certificate.")

ip = parser(text)

If you have a file with multiple lines you can replace the last line by:

with open("log.txt", "r") as fp:
    ips = [parser(line) for line in fp]

Good luck.

Upvotes: 1

hughdbrown
hughdbrown

Reputation: 49003

Use regular expressions.

Code like this:

import re

compiled = re.compile(r"""
    .*?                                # Leading junk
    (?P<ipaddress>\d+\.\d+\.\d+\.\d+)  # IP address
    .*?                                # Trailing junk
    """, re.VERBOSE)
str = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
m = compiled.match(str)
print m.group("ipaddress")

And you get this:

>>> import re
>>> 
>>> compiled = re.compile(r"""
...     .*?                                # Leading junk
...     (?P<ipaddress>\d+\.\d+\.\d+\.\d+)  # IP address
...     .*?                                # Trailing junk
...     """, re.VERBOSE)
>>> str = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
>>> m = compiled.match(str)
>>> print m.group("ipaddress")
1.2.3.4

Also, I learned there there is a dictionary of matches, groupdict():

>>>> str = "Peer 10.11.6.224 is currently trusted in the white list, but it is now using a new trusted certificate. Consider removing its likely outdated white list entry."
>>>> m = compiled.match(str)
>>>> print m.groupdict()
{'ipaddress': '10.11.6.224'}

Later: fixed that. The initial '.*' was eating your first character match. Changed it to be non-greedy. For consistency (but not necessity), I changed the trailing match, too.

Upvotes: 1

AlG
AlG

Reputation: 15157

Really simple answer:

msg = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."

parts = msg.split(' ', 2)

print parts[1]

results in:

1.2.3.4

You could also do REs if you wanted, but for something this simple...

Upvotes: 5

J.J.
J.J.

Reputation: 5069

There will be dozens of possible approaches, pros and cons depend on the details of your log file. One example, using the re module:

import re
x = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
pattern = "IP ([0-9\.]+) is currently trusted in the white list"
m = re.match(pattern, x)
for ip in m.groups():
    print ip

If you want to print out every instance of that string in your log file, you'd do something like this:

import re
pattern = "(IP [9-0\.]+ is currently trusted in the white list, but it is now using a new trusted certificate.)"
m = re.match(pattern, log_data)
for match in m.groups():
    print match

Upvotes: 2

Related Questions