Reputation: 110382
I need to remove the year from a copyright. The copyright can be in the following forms:
2011 Company --> 'Company'
Company 2011 --> 'Company'
2011 1 Company 2 --> '1 Company 2'
1 Company 2 1944 --> '1 Company 2'
How would I remove the 4-digit copyright and get the company only (note that the company may include numbers in its name).
So far I've tried [0-9]{4}, but have had trouble forming it into a re search.
>>> a=re.search('[0-9]{4}',a)
>>> a
<_sre.SRE_Match object at 0x10527b780>
>>> a.match(0)
>>> AttributeError: match
Upvotes: 0
Views: 319
Reputation: 741
import re
def removeYear(inputStr):
pattern1 = re.compile(r'^\d{4,4}\ (.*)')
outputStr = re.sub(pattern1, r"\1", inputStr)
pattern2 = re.compile(r'(.*)\ \d{4,4}$')
fixedStr = re.sub(pattern2, r"\1", outputStr)
print '-->'+ fixedStr
if __name__ == '__main__':
removeYear('2011 Company')
removeYear('Company 2011')
removeYear('2011 1 Company 2')
removeYear('1 Company 2 1944')
Upvotes: 1
Reputation: 2598
\d{4}
means 4-digits, or you can write like this in order to trim white spaces:
\s*\d{4}\s*
Is this what you want?
Upvotes: 0
Reputation: 1253
Try this
>>> import re
>>> s = '2011 Company'
>>> removed = re.sub('(^\d{4})|(\d{4}$)','',s).strip()
>>> print removed
Upvotes: 1