Karan Kumar
Karan Kumar

Reputation: 3176

Python - Regex to exclude some lines out of file

Heres what the file of IPs looks like :

# 111.111.111.111     <= exclude starting with # 
112.112.112.112 1     <= exclude one which has 1 next to it after space(s)
113.113.113.113 2     <= exclude one which has 2 next to it after space(s)
114.114.114.114 3     <= print this 
115.115.115.115 4     <= print this and so on

My take on this:

ip = re.findall(r".*\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}[\s](?!1|2)", x)

This is not showing me the right IPs, I am a JS developer, would appreciate the help.

Upvotes: 0

Views: 121

Answers (4)

DigitShifter
DigitShifter

Reputation: 854

This regex might help:

^(?:\d{3}\.){3}\d{3}\s+([^012]|\d{2,})\b

Note: re.MULTILINE is used since the flag '^' is used.

Comment: This option will also handle numbers larger than 9. A tricky number is for instance 22 in context '116.116.116.116 22' .

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163207

You could use a charcter class to match either 0, or a digit 3-9.

^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+[03-9]$

Regex demo

If there can be more digits than 0 and 3-9, you could use an alternation also matching 2 or more digits.

^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+(?:[03-9]|\d{2,})$

Regex demo

Upvotes: 1

Daweo
Daweo

Reputation: 36360

I would do:

import re
data = '''# 111.111.111.111      
112.112.112.112 1
113.113.113.113 2
114.114.114.114 3 
115.115.115.115 4'''
ip = re.findall(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b(?![\s]+[12])', data, re.MULTILINE)
print(ip)

Output:

['114.114.114.114', '115.115.115.115']

Explanation: ^ combined with re.MULTILINE - looks for substring spanning from start of line, \b make sure that whole adress is caught (prevent getting 112.112.112.11 for example), (?![\s]+[12]) negative lookahead - one or more whitespaces followed by 1 or 2. Note that python lookaheads might have variable length, whilst lookbehinds need to be fixed length.

Upvotes: 1

Dani Mesejo
Dani Mesejo

Reputation: 61910

What about returning the lines that do not start with # and do not have a 1 or 2 at the end:

import re

with open("ip.txt") as infile:  # change it to your real file name
    for line in infile:
        ip = line.strip()
        match = re.match(r"#(.+)|(.+?)\s+[12]$", ip)
        if not match:
            print(ip)

Output

114.114.114.114 3
115.115.115.115 4

Upvotes: 1

Related Questions