Reputation: 445
The value for distinguishedname in AD is typically in the format:
CN=lastName\,firstName,OU=Users - XYZ,OU=Users-Test Place,OU=UsersAll,DC=Dom1,DC=Dom2
I would like to parse it using a regular expression and get back the following values
CN=lastName\, firstName
OU=Users - XYZ
OU=Users-Test Place
OU=UsersAll
DC=Dom1
DC=Dom2
The pattern "\w+=\w+" didn't help.
I see the problem but am at a loss for a solution.
Thanks for your help.
Upvotes: 0
Views: 6346
Reputation: 241861
The syntax for Distinguished Names is set out in RFC 4514 (which replaces RFC 2253), and it is not really fully parseable with a regex. OpenLDAP contains some library functions which will parse and validate, for what it's worth. However, if you need a quick-and-dirty regex, you can use the following Posix ERE: ([^\,]|\\.)*
(In Perl, Python, or other languages with similar regex extensions, use (?:[^\,]|\\.)*
to avoid the needless capture.)
This means "match any sequence of characters other than ,
and \
, possibly also including pairs of \
and any single character". This is a superset of the actual LDAP specification, which does not allow \
to be followed by anything other than hex digits or one of a handful of special characters, so it will accept a number of invalid DN components, but it should accept all valid ones and, I believe, will never swallow a comma which separates DN components.
Here's a simple test, in bash, using grep:
$ echo 'CN=lastName\, firstName,OU=Users - XYZ,OU=Users-Test Place,OU=UsersAll,DC=Dom1,DC=Dom2' |
> grep -oE '([^\,]|\\.)*'
CN=lastName\, firstName
OU=Users - XYZ
OU=Users-Test Place
OU=UsersAll
DC=Dom1
DC=Dom2
Upvotes: 3