Reputation: 119
I have multiple log files with LDAP entries and I'm trying to match only the entries that have a createtimestamp during a certain date but capture the whole entry, not just the timestamp. The entries are as follows:
dn: ....
otherattr:
...
createtimestamp: 20130621061525Z
The problem is that I am getting all of the entries that come before the one I want as well.
dn: ....
otherattr:
...
createtimestamp: 20121221082545Z
dn: ....
otherattr:
...
createtimestamp: 20130621061525Z
This is the expression:
dn_search = re.compile(r'dn: (.*?)createtimestamp: 20130[4-6]\d+?Z', flags=re.M|re.S)
I've tried some other expressions but I am either getting only the createtimestamp or unwanted entries. Any ideas?
Upvotes: 2
Views: 485
Reputation: 15010
This regex will assume each group of text start with dn:
and ends with an empty line. It will then capture the entire group of lines, and capture the createtimestamp
field's value
^dn:(?=(?:(?!^createtimestamp:|^dn:|^\s*(?:\r|\n\|$)|\Z).)*^createtimestamp:\s*([^\s\r\n]*))(?:(?!^dn:|^\s*(?:\r|\n\|$)|\Z).)*
Link to working example http://repl.it/J0t
Code
import re
string = """dn: ....
otherattr:
...
createtimestamp: 20121221082545Z_1
dn: ....
otherattr:
...
createtimestamp: 20130621061525Z_2
""";
for matchObj in re.finditer( r'^dn:(?=(?:(?!^createtimestamp:|^dn:|^\s*(?:\r|\n\|$)|\Z).)*^createtimestamp:\s*([^\s\r\n]*))(?:(?!^dn:|^\s*(?:\r|\n\|$)|\Z).)*', string, re.M|re.I|re.S):
print "-------"
print "matchObj.group(1) : ", matchObj.group(1)
Returns
-------
matchObj.group(1) : 20121221082545Z_1
-------
matchObj.group(1) : 20130621061525Z_2
Upvotes: 2
Reputation: 91
Don't try to parse LDIF by hand. It's not complicated, but things like attribute and name escaping, and line continuations for long lines, will bite you. Use the LDIF parser from python-ldap.
Upvotes: 2