Reputation: 475
I am trying to search a string in python using regex for a particular word that begins with a space and ends with a space after it. The string in question that I want to search is;
JAKARTA, INDONESIA (1 February 2017)
and I want to get back the ", INDONESIA ("
part so I can apply rtrim
and ltrim
to it. As I could also be returning United Kingdom.
I have attempted to write this code within my python code;
import re
text = "JAKARTA, INDONESIA (1 February 2017)"
countryRegex = re.compile(r'^(,)(\s)([a-zA-Z]+)(\s)(\()$')
mo = countryRegex.search(text)
print(mo.group())
However this prints out the result
AttributeError: 'NoneType' object has no attribute 'group'
Indicated to me that I am not returning any matched objects.
I then attempted to use my regex in regex 101 however it still returns an error here saying "Your regular expression does not match the subject string."
I assumed this would work as I test for literal comma (,
) then a space (\s
), then one or more letters ([a-zA-Z]+
), then another space (\s
) and then finally an opening bracket making sure I have escaped it (\(
). Is there something wrong with my regex?
Upvotes: 1
Views: 8199
Reputation: 627302
Once you remove the anchors (^
matches the start of string position and $
matches the end of string position), the regex will match the string. However, you may get INDONESIA
with a capturing group using:
,\s*([a-zA-Z]+)\s*\(
See the regex demo. match.group(1)
will contain the value.
Details:
,\s*
- a comma and zero or more whitespaces (replace *
with +
if you want at least 1 whitespace to be present) ([a-zA-Z]+)
- capturing group 1 matching one or more ASCII letters\s*
- zero or more whitespaces\(
- a (
literal symbol.import re
text = "JAKARTA, INDONESIA (1 February 2017)"
countryRegex = re.compile(r',\s*([a-zA-Z]+)\s*\(')
mo = countryRegex.search(text)
if mo:
print(mo.group(1))
An alternative regex that would capture anything between ,
+whitespace and whitespace+(
is
,\s*([^)]+?)\s*\(
See this regex demo. Here, [^)]+?
matches 1+ chars other than )
as few as possible.
Upvotes: 1
Reputation: 307
You can try use this regex instead, with a Lookbehind and a lookahead so it only matches the State part.
Adding a space in the list can help you match states like United Kingdom.
(?<=, )([a-zA-Z ]+)(?= \()
Upvotes: 2