Seth
Seth

Reputation: 103

Parsing a text file for pattern and writing found pattern back to another file python 3.4

I am trying to open a text file. Parse the text file for specific regex patterns then when if I find that pattern I write the regex returned pattern to another text file.

Specifically a list of IP Addresses which I want to parse specific ones out of.

So the file may have

10.10.10.10
9.9.9.9
5.5.5.5
6.10.10.10

And say I want just the IPs that end in 10 (the regex I think I am good with) My example looks for the 10.180.42, o4 41.XX IP hosts. But I will adjust as needed.

I've tried several method and fail miserably at them all. It's days like this I know why I just never mastered any language. But I'm committed to Python so here goes.

import re

textfile = open("SymantecServers.txt", 'r')

matches = re.findall('^10.180\.4[3,1].\d\d',str(textfile))
print(matches)

This gives me empty backets. I had to encase the textfile in the str function or it just puked. I don't know if this is right.

This just failed all over the place no matter how I fine tuned it.

f = open("SymantecServers.txt","r")
o = open("JustIP.txt",'w', newline="\r\n")
for line in f:
    pattern = re.compile("^10.180\.4[3,1].\d\d")
    print(pattern)
    #o.write(pattern)
    #o.close()
   f.close()

I did get one working but it just returned the entire line (including netmask and other test like hostname which are all on the same line in the text file. I just want IP)

Any help on how to read a text file and if it has a pattern of IP grab the full IP and write that into another text file so I end up with a text file with a list of just the IPs I want. I am 3 hours into it and behind on work so going to do the first file by hand...

I am just at a loss what I am missing. Sorry for being a newbie

Upvotes: 1

Views: 2172

Answers (2)

signus
signus

Reputation: 1148

What you're missing is that you're doing a re.compile() which creates a Regular Expression object in Python. You're forgetting to match.

You could try:

# This isn't the best way to match IP's, but if it fits for your use-case keep it for now.
pattern = re.compile("^10.180\.4[13].\d\d")

f = open("SymantecServers.txt",'r')
o = open("JustIP.txt",'w')

for line in f:
     m = pattern.match(line)

     if m is not None:
          print "Match: %s" %(m.group(0))
          o.write(m.group(0) + "\n")

f.close()
o.close()

Which is compiling the Python object, attempting to match the line against the compiled object, and then printing out that current match. I can avoid having to split my matches, but I have to pay attention to matching groups - therefore group(0)

You can also look at re.search() which you can do, but if you're running search enough times with the same regular expression, it becomes more worthwhile to use compile.

Also note that I moved the f.close() to the outside of the for loop.

Upvotes: 0

zmo
zmo

Reputation: 24812

here is it working:

>>> s = """10.10.10.10
... 9.9.9.9
... 5.5.5.5
... 10.180.43.99
... 6.10.10.10"""
>>> re.findall(r'10\.180\.4[31]\.\d\d', s)
['10.180.43.99']
  • you do not really need to add line boundaries, as you're matching a very specific IP address, if your file does not have weird things like '123.23.234.10.180.43.99.21354' that you don't want to match, it should be ok!
  • your syntax of [3,1] is matching either 3, 1 or , and you don't want to match against a comma ;-)

about your function:

r = re.compile(r'10\.180\.4[31]\.\d\d')
with open("SymantecServers.txt","r") as f:
    with open("JustIP.txt",'w', newline="\r\n") as o:
        for line in f:
            matches = r.findall(line)
            for match in matches:
                o.write(match)

though if I were you, I'd extract IPs using:

r = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
with open("SymantecServers.txt","r") as f:
    with open("JustIP.txt",'w', newline="\r\n") as o:
        for line in f:
            matches = r.findall(line)
            for match in matches:
                a, b, c, d = match.split('.')
                if int(a) < 255 and int(b) < 255 and int(c) in (43, 41) and int(d) < 100:
                    o.write(match)

or another way to do it:

r = re.compile(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})')
with open("SymantecServers.txt","r") as f:
    with open("JustIP.txt",'w', newline="\r\n") as o:
        for line in f:
            m = r.match(line)
            if m:
                a, b, c, d = m.groups()
                if int(a) < 255 and int(b) < 255 and int(c) in (43, 41) and int(d) < 100:
                    o.write(match)

which uses the regex to split the IP address into groups.

Upvotes: 1

Related Questions