mp252
mp252

Reputation: 475

Searching regex expression, to return string with spaces

I am trying to search a string in python using regex for a particular word that begins with a space and ends with a space after it. The string in question that I want to search is;

JAKARTA, INDONESIA (1 February 2017)

and I want to get back the ", INDONESIA (" part so I can apply rtrim and ltrim to it. As I could also be returning United Kingdom.

I have attempted to write this code within my python code;

import re
text = "JAKARTA, INDONESIA (1 February 2017)"
countryRegex = re.compile(r'^(,)(\s)([a-zA-Z]+)(\s)(\()$')
mo = countryRegex.search(text)
print(mo.group())

However this prints out the result

AttributeError: 'NoneType' object has no attribute 'group'

Indicated to me that I am not returning any matched objects.

I then attempted to use my regex in regex 101 however it still returns an error here saying "Your regular expression does not match the subject string."

I assumed this would work as I test for literal comma (,) then a space (\s), then one or more letters ([a-zA-Z]+), then another space (\s) and then finally an opening bracket making sure I have escaped it (\(). Is there something wrong with my regex?

Upvotes: 1

Views: 8199

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627302

Once you remove the anchors (^ matches the start of string position and $ matches the end of string position), the regex will match the string. However, you may get INDONESIA with a capturing group using:

,\s*([a-zA-Z]+)\s*\(

See the regex demo. match.group(1) will contain the value.

Details:

  • ,\s* - a comma and zero or more whitespaces (replace * with + if you want at least 1 whitespace to be present)
  • ([a-zA-Z]+) - capturing group 1 matching one or more ASCII letters
  • \s* - zero or more whitespaces
  • \( - a ( literal symbol.

Sample Python code:

import re 
text = "JAKARTA, INDONESIA (1 February 2017)"
countryRegex = re.compile(r',\s*([a-zA-Z]+)\s*\(') 
mo = countryRegex.search(text)
if mo:
    print(mo.group(1))

An alternative regex that would capture anything between ,+whitespace and whitespace+( is

,\s*([^)]+?)\s*\(

See this regex demo. Here, [^)]+? matches 1+ chars other than ) as few as possible.

Upvotes: 1

Giacomo Garabello
Giacomo Garabello

Reputation: 307

You can try use this regex instead, with a Lookbehind and a lookahead so it only matches the State part.
Adding a space in the list can help you match states like United Kingdom.

(?<=, )([a-zA-Z ]+)(?= \()

Test on Regex101

Upvotes: 2

Related Questions