Bg1850
Bg1850

Reputation: 3082

How to exclude the occurence of alpha char at the end the string

I have a set of list of movies which is like this

Name: The Godfather: Part II (1974) 1080p 
Genre:  Crime | Drama 
rating:  9.1/10

Now what I want to achieve is to get the movie name till the year that is The Godfather: Part II

however while making the regular expression its always taking the last p in the name string

what I am doing is

r=re.compile(r"[^a-zA-Z :]")

and then

r.sub("",Name)

the result is coming as

The Godfather: Part II  p 

Now my question is how do I exclude the alpha char at the end by regular expression?

Upvotes: 1

Views: 37

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

re.search or re.findall is the way to go.

>>> Name = "The Godfather: Part II (1974) 1080p "
>>> re.findall(r'(.*?)\s+\(\d{4}\)', Name)
['The Godfather: Part II']
>>> re.search(r'(.*?)\s+\(\d{4}\)', Name).group(1)
'The Godfather: Part II'

If you want to use re.sub, then match all the characters from the year upto to the last.

>>> re.sub(r'\s+\(\d{4}\).*', r'', Name)
'The Godfather: Part II'

Upvotes: 2

vks
vks

Reputation: 67968

print re.findall(r"^(.+?)(?=\(\d{4}|\d{4})",Name)

You are better of trying to match than removing unwanted.

Upvotes: 1

Related Questions