RT.
RT.

Reputation: 445

Regular expression for parsing Active Directory distinguishedName

The value for distinguishedname in AD is typically in the format:

CN=lastName\,firstName,OU=Users - XYZ,OU=Users-Test Place,OU=UsersAll,DC=Dom1,DC=Dom2

I would like to parse it using a regular expression and get back the following values

CN=lastName\, firstName
OU=Users - XYZ
OU=Users-Test Place
OU=UsersAll
DC=Dom1
DC=Dom2

The pattern "\w+=\w+" didn't help.

I see the problem but am at a loss for a solution.

Thanks for your help.

Upvotes: 0

Views: 6346

Answers (1)

rici
rici

Reputation: 241861

The syntax for Distinguished Names is set out in RFC 4514 (which replaces RFC 2253), and it is not really fully parseable with a regex. OpenLDAP contains some library functions which will parse and validate, for what it's worth. However, if you need a quick-and-dirty regex, you can use the following Posix ERE: ([^\,]|\\.)* (In Perl, Python, or other languages with similar regex extensions, use (?:[^\,]|\\.)* to avoid the needless capture.)

This means "match any sequence of characters other than , and \, possibly also including pairs of \ and any single character". This is a superset of the actual LDAP specification, which does not allow \ to be followed by anything other than hex digits or one of a handful of special characters, so it will accept a number of invalid DN components, but it should accept all valid ones and, I believe, will never swallow a comma which separates DN components.

Here's a simple test, in bash, using grep:

$ echo 'CN=lastName\, firstName,OU=Users - XYZ,OU=Users-Test Place,OU=UsersAll,DC=Dom1,DC=Dom2' |
> grep -oE '([^\,]|\\.)*'
CN=lastName\, firstName
OU=Users - XYZ
OU=Users-Test Place
OU=UsersAll
DC=Dom1
DC=Dom2

Upvotes: 3

Related Questions