Reputation: 3082
I have a set of list of movies which is like this
Name: The Godfather: Part II (1974) 1080p
Genre: Crime | Drama
rating: 9.1/10
Now what I want to achieve is to get the movie name till the year
that is The Godfather: Part II
however while making the regular expression its always taking the last p in the name string
what I am doing is
r=re.compile(r"[^a-zA-Z :]")
and then
r.sub("",Name)
the result is coming as
The Godfather: Part II p
Now my question is how do I exclude the alpha char at the end by regular expression?
Upvotes: 1
Views: 37
Reputation: 174706
re.search
or re.findall
is the way to go.
>>> Name = "The Godfather: Part II (1974) 1080p "
>>> re.findall(r'(.*?)\s+\(\d{4}\)', Name)
['The Godfather: Part II']
>>> re.search(r'(.*?)\s+\(\d{4}\)', Name).group(1)
'The Godfather: Part II'
If you want to use re.sub
, then match all the characters from the year upto to the last.
>>> re.sub(r'\s+\(\d{4}\).*', r'', Name)
'The Godfather: Part II'
Upvotes: 2
Reputation: 67968
print re.findall(r"^(.+?)(?=\(\d{4}|\d{4})",Name)
You are better of trying to match than removing unwanted.
Upvotes: 1