David542
David542

Reputation: 110382

Regular Expression to remove a copyright

I need to remove the year from a copyright. The copyright can be in the following forms:

2011 Company --> 'Company'
Company 2011 --> 'Company'
2011 1 Company 2 --> '1 Company 2'
1 Company 2 1944 --> '1 Company 2'

How would I remove the 4-digit copyright and get the company only (note that the company may include numbers in its name).

So far I've tried [0-9]{4}, but have had trouble forming it into a re search.

>>> a=re.search('[0-9]{4}',a)
>>> a
<_sre.SRE_Match object at 0x10527b780>
>>> a.match(0)
>>> AttributeError: match

Upvotes: 0

Views: 319

Answers (3)

Michael
Michael

Reputation: 741

import re

def removeYear(inputStr):
    pattern1 = re.compile(r'^\d{4,4}\ (.*)')
    outputStr = re.sub(pattern1, r"\1", inputStr)

    pattern2 = re.compile(r'(.*)\ \d{4,4}$')
    fixedStr = re.sub(pattern2, r"\1", outputStr)
    print '-->'+ fixedStr

if __name__ == '__main__':
    removeYear('2011 Company')
    removeYear('Company 2011')
    removeYear('2011 1 Company 2')
    removeYear('1 Company 2 1944')

Upvotes: 1

Junichi Ito
Junichi Ito

Reputation: 2598

\d{4} 

means 4-digits, or you can write like this in order to trim white spaces:

\s*\d{4}\s*

Is this what you want?

Upvotes: 0

Syrus Akbary Nieto
Syrus Akbary Nieto

Reputation: 1253

Try this

>>> import re
>>> s = '2011 Company'
>>> removed = re.sub('(^\d{4})|(\d{4}$)','',s).strip()
>>> print removed

Upvotes: 1

Related Questions