Reputation: 3060
I need a regex that captures 2 groups: a movie and the year. Optionally, there could be a 'from ' string between them.
My expected results are:
first_query="matrix 2013" => ('matrix', '2013')
second_query="matrix from 2013" => ('matrix', '2013')
third_query="matrix" => ('matrix', None)
I've done 2 simulations on https://regex101.com/ for python3:
I- r"(.+)(?:from ){0,1}([1-2]\d{3})"
Doesn't match first_query and third_query, also doesn't omit 'from' in group one, which is what I want to avoid.
II- r"(.+)(?:from ){1}([1-2]\d{3})"
Works with second_query, but does not match first_query and third_query.
Is it possible to match all three strings, omitting the 'from ' string from the first group?
Thanks in advance.
Upvotes: 1
Views: 95
Reputation: 15349
import re
pattern = re.compile( r"""
^\s* # start of string (optional whitespace)
(?P<title>\S+) # one or more non-whitespace characters (title)
(?:\s+from)? # optionally, some space followed by the word 'from'
\s* # optional whitespace
(?P<year>[0-9]+)? # optional digit string (year)
\s*$ # end of string (optional whitespace)
""", re.VERBOSE )
for query in [ 'matrix 2013', 'matrix from 2013', 'matrix' ]:
m = re.match( pattern, query )
if m: print( m.groupdict() )
# Prints:
# {'title': 'matrix', 'year': '2013'}
# {'title': 'matrix', 'year': '2013'}
# {'title': 'matrix', 'year': None}
Disclaimer: this regex does not contain the logic necessary to reject the first two matches on the grounds that The Matrix actually came out in 1999.
Upvotes: 1
Reputation: 51643
This will output your patters, but have a space too much in from of the number:
import re
pat = r"^(.+?)(?: from)? ?(\d+)?$"
text = """matrix 2013
matrix from 2013
matrix"""
for t in text.split("\n"):
print(re.findall(pat,t))
Output:
[('matrix', '2013')]
[('matrix', '2013')]
[('matrix', '')]
Explanation:
^ start of string
(.+?) lazy anythings as few as possible
(?: from)? non-grouped optional ` from`
? optional space
(\d+=)?$ optional digits till end of string
Demo: https://regex101.com/r/VD0SZb/1
Upvotes: 2
Reputation: 626738
You may use
^(.+?)(?:\s+(?:from\s+)?([12]\d{3}))?$
See the regex demo
Details
^
- start of a string(.+?)
- Group 1: any 1+ chars other than line break chars, as few as possible(?:\s+(?:from\s+)?([12]\d{3}))?
- an optional non-capturing group matching 1 or 0 occurrences of:
\s+
- 1+ whitespaces(?:from\s+)?
- an optional sequence of from
substring followed with 1+ whitespaces ([12]\d{3})
- Group 2: 1
or 2
followed with 3 digits$
- end of string.Upvotes: 3