Reputation: 48630
I though that I have set up the expression appropriately, but the split is not working as intended.
c = re.compile(r'(?<=^\d\.\d{1,2})\s+');
for header in ['1.1 Introduction', '1.42 Appendix']:
print re.split(c, header)
Expected result:
['1.1', 'Introduction']
['1.42', 'Appendix']
I am getting the following stacktrace:
Traceback (most recent call last):
File "foo.py", line 1, in
c = re.compile(r'(?<=^\d.\d{1,2})\s+');
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: look-behind requires fixed-width pattern
<<< Process finished. (Exit code 1)
Upvotes: 1
Views: 780
Reputation: 1324
your error in the regex is in the part {1,2}
because Lookbehinds need to be fixed-width, thus quantifiers are not allowed.
try this website to test your regex before you put it in code.
BUT in your case you don't need to use regex at all:
simply try this:
for header in ['1.1 Introduction', '1.42 Appendix']:
print header.split(' ')
result:
['1.1', 'Introduction']
['1.42', 'Appendix']
hope this helps.
Upvotes: 1
Reputation: 39375
My solution may look lame. But you are checking only two digits after dot. So, you can use two lookbehind.
c = re.compile(r'(?:(?<=^\d\.\d\d)|(?<=^\d\.\d))\s+');
Upvotes: 0
Reputation: 71548
Lookbehinds in python cannot be of variable width, so your lookbehind is not valid.
You can use a capture group as a workaround:
c = re.compile(r'(^\d\.\d{1,2})\s+');
for header in ['1.1 Introduction', '1.42 Appendix']:
print re.split(c, header)[1:] # Remove the first element because it's empty
Output:
['1.1', 'Introduction']
['1.42', 'Appendix']
Upvotes: 4