nickebowen
nickebowen

Reputation: 41

Regex to parse SDDL

I'm using python to parse out an SDDL using regex. The SDDL is always in the form of 'type:some text' repeated up to 4 times. The types can be either 'O', 'G', 'D', or 'S' followed by a colon. The 'some text' will be variable in length.

Here is a sample SDDL:

O:DAG:S-1-5-21-2021943911-1813009066-4215039422-1735D:(D;;0xf0007;;;AN)(D;;0xf0007;;;BG)S:NO_ACCESS_CONTROL

Here is what I have so far. Two of the tuples are returned just fine, but the other two - ('G','S-1-5-21-2021943911-1813009066-4215039422-1735') and ('S','NO_ACCESS_CONTROL') are not.

import re

sddl="O:DAG:S-1-5-21-2021943911-1813009066-4215039422-1735D:(D;;0xf0007;;;AN)(D;;0xf0007;;;BG)S:NO_ACCESS_CONTROL"

matches = re.findall('(.):(.*?).:',sddl)

print matches

[('O', 'DA'), ('D', '(D;;0xf0007;;;AN)(D;;0xf0007;;;BG)')]

what I'd like to have returned is

[('O', 'DA'), ('G','S-1-5-21-2021943911-1813009066-4215039422-1735'), ('D', '(D;;0xf0007;;;AN)(D;;0xf0007;;;BG)'),('S','NO_ACCESS_CONTROL')]

Upvotes: 4

Views: 893

Answers (2)

Zhehao Mao
Zhehao Mao

Reputation: 1789

It seems like using regex isn't the best solution to this problem. Really, all you want to do is split across the colons and then do some transformations on the resulting list.

chunks = sddl.split(':')
pairs = [(chunks[i][-1], chunks[i+1][:-1] \
                             if i < (len(chunks) - 2) \
                             else chunks[i+1]) 
               for i in range(0, len(chunks) - 1)]

Upvotes: 0

Andrew Clark
Andrew Clark

Reputation: 208555

Try the following:

(.):(.*?)(?=.:|$)

Example:

>>> re.findall(r'(.):(.*?)(?=.:|$)', sddl)
[('O', 'DA'), ('G', 'S-1-5-21-2021943911-1813009066-4215039422-1735'), ('D', '(D;;0xf0007;;;AN)(D;;0xf0007;;;BG)'), ('S', 'NO_ACCESS_CONTROL')]

This regex starts out the same way as yours, but instead of including the .: at the end as a part of the match, a lookahead is used. This is necessary because re.findall() will not return overlapping matches, so you need each match to stop before the next match begins.

The lookahead (?=.:|$) essentially means "match only if the next characters are anything followed by a colon, or we are at the end of the string".

Upvotes: 2

Related Questions