Reputation: 1585
I have most of this regex down, however I'm having trouble with a lookahead. I want to separate a string into a postcode, followed by two strings or two numbers. The numbers can be of the form:
1
1.5
1.55
11.55
The text for the middle bit can be "No minimum" and the text for the third bit can only be "Free".
E.g.
"YO1£ 10Free" ==> YO1; 10; Free
or
"yo1£ 8£ 0.5" ==> yo1; 8; 0.5
or
"yo1No minimum£ 0.75" ==> yo1; No minimum; 0.75
I have the first bit done with this:
string = "YO1£ 10Free"
patternPostCode = re.compile("[a-zA-Z]{1,2}[0-9][a-zA-Z0-9]?")
postCode = re.findall(string,patternPostCode)
The figures in the string are found by:
patternCost = re.compile(r"(?<=\xa3 )([0-9]|
[0-9][0-9]|
[0-9]?[0-9]?.[0-9]|
[0-9]?[0-9]?.[0-9][0-9])")
I have difficulty adding the 'or text equals "No minimum"' to the patternCost search. I also can't manage to include the lookahead Â. Adding this at the end doesn't work:
(?<=\xc2)
Any help would be appreciated.
Upvotes: 1
Views: 630
Reputation: 5874
I came up with this on Python 2.7:
# -*- coding: utf-8 -*-
import re
raw_string = "YO1£ 10.01Free"
string = raw_string.decode('utf-8')
patternPostCode = re.compile(u"^(\w{3}.*)\s+(\d+\.?\d*)(\w+)$",flags=re.UNICODE)
postCode = patternPostCode.findall(string)
print postCode
print u'; '.join(postCode[0])
This returns:
[(u'YO1\xc2\xa3', u'10.01', u'Free')]
YO1£; 10.01; Free
First, the raw string I copied from SO appeared to be a bytestring, I had to decode it to unicode (see byte string vs. unicode string. Python). I think you may be having unicode encoding errors in general - the  symbol is a classic telltale of that.
I then made your regex unicode-friendly, with the re.UNICODE flag. This means you can use \w to mean "alphanumeric" and \d to mean "digits" in a unicode-friendly way.
http://docs.python.org/2/library/re.html#module-re
Since regexes are often mistaken for line noise, lemme unpack for you:
u"^(\w{3}.*)\s+(\d+\.?\d*)(\w+)$"
It's certainly not the prettiest regex I've ever written, but hopefully it's enough to get you started.
Upvotes: 1