Reputation: 269
I'm having an issue compiling the correct regular expression for a multiline match. Can someone point out what I'm doing wrong. I'm looping through a basic dhcpd.conf file with hundreds of entries such as:
host node20007
{
hardware ethernet 00:22:38:8f:1f:43;
fixed-address node20007.domain.com;
}
I've gotten various regex's to work for the MAC and fixed-address but cannot combine them to match properly.
f = open('/etc/dhcp3/dhcpd.conf', 'r')
re_hostinfo = re.compile(r'(hardware ethernet (.*))\;(?:\n|\r|\r\n?)(.*)',re.MULTILINE)
for host in f:
match = re_hostinfo.search(host)
if match:
print match.groups()
Currently my match groups will look like:
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', '')
But looking for something like:
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
Upvotes: 7
Views: 14035
Reputation: 343067
Sometimes, the easier method is not using regex. Just an example
for line in open("dhcpd.conf"):
line = line.rstrip()
sline = line.split()
if "hardware ethernet" or "fixed-address" in line:
print sline[-1]
another way
data = open("file").read().split("}");
for item in data:
item = [ i.strip() for i in item.split("\n") if i != '' ];
for elem in item:
if "hardware ethernet" in elem:
print elem.split()[-1]
if item: print item[-1]
output
$ more file
host node20007
{
hardware ethernet 00:22:38:8f:1f:43;
fixed-address node20007.domain.com;
}
host node20008
{
hardware ethernet 00:22:38:8f:1f:44;
some-address node20008.domain.com;
}
$ python test.py
00:22:38:8f:1f:43;
fixed-address node20007.domain.com;
00:22:38:8f:1f:44;
some-address node20008.domain.com;
Upvotes: 0
Reputation: 83032
Update I've just noticed the real reason that you are getting the results that you got; in your code:
for host in f:
match = re_hostinfo.search(host)
if match:
print match.groups()
host
refers to a single line, but your pattern needs to work over two lines.
Try this:
data = f.read()
for x in regex.finditer(data):
process(x.groups())
where regex
is a compiled pattern that matches over two lines.
If your file is large, and you are sure that the pieces of interest are always spread over two lines, then you could read the file a line at a time, check the line for the first part of the pattern, setting a flag to tell you whether the next line should be checked for the second part. If you are not sure, it's getting complicated, maybe enough to start looking at the pyparsing module.
Now back to the original answer, discussing the pattern that you should use:
You don't need MULTILINE; just match whitespace. Build up your pattern using these building blocks:
(1) fixed text (2) one or more whitespace characters (3) one or more non-whitespace characters
and then put in parentheses to get your groups.
Try this:
>>> m = re.search(r'(hardware ethernet\s+(\S+));\s+\S+\s+(\S+);', data)
>>> print m.groups()
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
>>>
Please consider using "verbose mode" ... you can use it to document exactly which pieces of pattern match which pieces of data, and it can often help getting the pattern right in the first place. Example:
>>> regex = re.compile(r"""
... (hardware[ ]ethernet \s+
... (\S+) # MAC
... ) ;
... \s+ # includes newline
... \S+ # variable(??) text e.g. "fixed-address"
... \s+
... (\S+) # e.g. "node20007.domain.com"
... ;
... """, re.VERBOSE)
>>> print regex.search(data).groups()
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
>>>
Upvotes: 13