Kudzu
Kudzu

Reputation: 3638

Regex to match complete, arbitrary-length, typeful LDAP name

I'm trying to use Python 3.4.0 to pull typeful LDAP FDNs out of a log file (checking each line for a match.) None of the regex patterns I've tried work, however. I need to be able to find the full FDN, regardless of the type of the first element of the FDN (e.g. DC,OU,CN).

log_line1 = 'This server name is "CN=Server001,OU=SomeOU,DC=MyDom,DC=org".'
log_line2 = 'Whereas this server is called "cn=Server002,ou=SubContainer,ou=Elsewhere,dc=SubDomain,dc=MyDom,dc=org" and "something else" is also in quotes most likely.'

I'm okay with finding each element of the FDN and concatenating them myself. The closest I've come is this, which pretty much finds every word in the string:

>>> ldappattern = re.compile("cn=[\w-]+,|ou=[\w-]+,|dc=[\w-]+,", re.IGNORECASE)
>>> re.findall(ldappattern, log_line1)
['This', 'server', 'name', 'is', 'CN=Server001,', 'OU=SomeOU,', 'DC=MyDom,', 'DC=org']

Note that these LDAP names can contain spaces, so whitespace searches are pretty useless, and I can't guarantee that the last element will be anything sensible (e.g. I've seen 'DC=testcompany,DC=internal' as the root elements of a domain, and even single-label DNS names for domains.) They should be in quotes, but they won't necessarily be the only thing in quotes on a given line.

Any ideas?

Upvotes: 0

Views: 1536

Answers (2)

Jerry
Jerry

Reputation: 71578

I would advise always rawing your regex strings to avoid any sorts of bad surprises. This said, I would then suggest this regex:

(?:cn|ou|dc)=[^,"]+

regex101 demo

ideone demo

I used single quotes for the regex string so that I wouldn't have to escape the double quote I have in the regex.

(?:cn|ou|dc) matches any of cn, ou or dc.

[^,"]+ matches any character except , and ".

Upvotes: 0

aliteralmind
aliteralmind

Reputation: 20163

If I'm understanding you, you want to capture the name and value of each element in strings like it:

CN=Server001,OU=SomeOU,DC=MyDom,DC=org

The following regex is one way to do it. Note that the ending comma must be optional (and it's best to add a word boundary before it), or you'll miss the last element:

(cn|ou|dc)=([\w-]+)\b,?

Regular expression visualization

Debuggex Demo

The name of each item is in capture group 1, and value in group two. It requires the ignore-case flag, as you know.

"Note that these LDAP names can contain spaces, so whitespace searches are pretty useless"

I don't understand. Your posted demo input contains no spaces.

Upvotes: 1

Related Questions