Reputation: 3638
I'm trying to use Python 3.4.0 to pull typeful LDAP FDNs out of a log file (checking each line for a match.) None of the regex patterns I've tried work, however. I need to be able to find the full FDN, regardless of the type of the first element of the FDN (e.g. DC,OU,CN).
log_line1 = 'This server name is "CN=Server001,OU=SomeOU,DC=MyDom,DC=org".'
log_line2 = 'Whereas this server is called "cn=Server002,ou=SubContainer,ou=Elsewhere,dc=SubDomain,dc=MyDom,dc=org" and "something else" is also in quotes most likely.'
I'm okay with finding each element of the FDN and concatenating them myself. The closest I've come is this, which pretty much finds every word in the string:
>>> ldappattern = re.compile("cn=[\w-]+,|ou=[\w-]+,|dc=[\w-]+,", re.IGNORECASE)
>>> re.findall(ldappattern, log_line1)
['This', 'server', 'name', 'is', 'CN=Server001,', 'OU=SomeOU,', 'DC=MyDom,', 'DC=org']
Note that these LDAP names can contain spaces, so whitespace searches are pretty useless, and I can't guarantee that the last element will be anything sensible (e.g. I've seen 'DC=testcompany,DC=internal'
as the root elements of a domain, and even single-label DNS names for domains.) They should be in quotes, but they won't necessarily be the only thing in quotes on a given line.
Any ideas?
Upvotes: 0
Views: 1536
Reputation: 71578
I would advise always rawing your regex strings to avoid any sorts of bad surprises. This said, I would then suggest this regex:
(?:cn|ou|dc)=[^,"]+
I used single quotes for the regex string so that I wouldn't have to escape the double quote I have in the regex.
(?:cn|ou|dc)
matches any of cn
, ou
or dc
.
[^,"]+
matches any character except ,
and "
.
Upvotes: 0
Reputation: 20163
If I'm understanding you, you want to capture the name and value of each element in strings like it:
CN=Server001,OU=SomeOU,DC=MyDom,DC=org
The following regex is one way to do it. Note that the ending comma must be optional (and it's best to add a word boundary before it), or you'll miss the last element:
(cn|ou|dc)=([\w-]+)\b,?
The name of each item is in capture group 1, and value in group two. It requires the ignore-case flag, as you know.
"Note that these LDAP names can contain spaces, so whitespace searches are pretty useless"
I don't understand. Your posted demo input contains no spaces.
Upvotes: 1