Reputation: 279

Python RE ( In a word to check first letter is case sensitive and rest all case insensitive)

In the below case i want to match string "Singapore" where "S" should always be capital and rest of the words may be in lower or in uppercase. but in the below string "s" is in lower case and it gets matched in search condition. can any body let me know how to implement this?

       import re       
            st = "Information in sinGapore "

            if re.search("S""(?i)(ingapore)" , st):
                print "matched"

Singapore => matched  
sIngapore => notmatched  
SinGapore => matched  
SINGAPORE => matched

Upvotes: 2

Answers (5)

Victor Wang

Reputation: 937

This is the BEST answer:

(?-i:S)(?i)ingapore

ClickHere for proof:

Upvotes: 0

eyquem

Reputation: 27575

Since you want to set a GV code according to the catched phrase (unique name or several name blank separated, I know that), there must be a step in which the code is choosen in a dictionary according to the catched phrase.
So it's easy to make a profit of this step to perform the test on the first letter (must be uppercased) or the first name in the phrase that no regex is capable of.

I choosed certain conditions to constitute the test. For example, a dot in a first name is not mandatory, but uppercased letters are. These conditions will be easily changed at need.

EDIT 1

import re

def regexize(cntry):
    def doot(x):
        return '\.?'.join(ch for ch in x) + '\.?'
    to_join = []
    for c in cntry:
        cspl = c.split(' ',1)
        if len(cspl)==1: # 'Singapore','Austria',...
            to_join.append('(%s)%s'
                           % (doot(c[0]), doot(c[1:])))
        else: # 'Den LMM','LMM Den',....
            to_join.append('(%s) +%s'
                           % (doot(cspl[0]),
                              doot(cspl[1].strip(' ').lower())))
    pattern = '|'.join(to_join).join('()')
    return re.compile(pattern,re.I)

def code(X,CNTR,r = regexize):
    r = regexize(CNTR)
    for ma in r.finditer(X):
        beg = ma.group(1).split(' ')[0]
        if beg==ma.group(1):
            GV = countries[beg[0]+beg[1:].replace('.','').lower()] \
                 if beg[0].upper()==beg[0] else '- bad match -'
        else:
            try:
                k = (ki for ki in countries.iterkeys()
                     if beg.replace('.','')==ki.split(' ')[0]).next()
                GV = countries[k]
            except StopIteration:
                GV = '- bad match -'
        yield '  {!s:15}  {!s:^13}'.format(ma.group(1), GV)

countries = {'Singapore':'SG','Austria':'AU',
             'Swiss':'CH','Chile':'CL',
             'Den LMM':'DN','LMM Den':'LM'}

s = ('  Singapore  SIngapore  SiNgapore  SinGapore'
     '  SI.Ngapore  SIngaPore  SinGaporE  SinGAPore'
     '  SINGaporE  SiNg.aPoR   singapore  sIngapore'
     '  siNgapore  sinGapore  sINgap.ore  sIngaPore'
     '  sinGaporE  sinGAPore  sINGaporE  siNgaPoRe'
     '    Austria    Aus.trIA    aUSTria    AUSTRiA'
     '  Den L.M.M     Den   Lm.M    DEn Lm.M.'
     '  DEN L.MM      De.n L.M.M.     Den LmM'
     '    L.MM   DEn      LMM DeN     LM.m  Den')

print '\n'
print '\n'.join(res for res in code(s,countries))

EDIT 2

I improved the code. It's shorter and more readable.
The instruction assert(.....] is to verify that the keys of the dictionaru are well formed for the purpose.

import re

def doot(x):
    return '\.?'.join(ch for ch in x) + '\.?'

def regexize(labels,doot=doot,
             wg2 = '(%s) *( %s)',wnog2 = '(%s)(%s)',
             ri = re.compile('(.(?!.*? )|[^ ]+)( ?) *(.+\Z)')):
    to_join = []
    modlabs = {}
    for K in labels.iterkeys():
        g1,g2,g3 = ri.match(K).groups()
        to_join.append((wg2 if g2 else wnog2)
                       % (doot(g1), doot(g3.lower())))
        modlabs[g1+g2+g3.lower()] = labels[K]
    return (re.compile('|'.join(to_join), re.I), modlabs)



def code(X,labels,regexize = regexize):
    reglab,modlabs = regexize(labels)
    for ma in reglab.finditer(X):
        a,b = tuple(x for x in ma.groups() if x)
        k = (a + b.lower()).replace('.','')
        GV = modlabs[k] if k in modlabs else '- bad match -'
        yield '  {!s:15}  {!s:^13}'.format(a+b, GV)

countries = {'Singapore':'SG','Austria':'AU',
             'Swiss':'CH','Chile':'CL',
             'Den LMM':'DN','LMM Den':'LM'}

assert(all('.' not in k and
          (k.count(' ')==1 or k[0].upper()==k[0])
          for k in countries))

s = ('  Singapore  SIngapore  SiNgapore  SinGapore'
     '  SI.Ngapore  SIngaPore  SinGaporE  SinGAPore'
     '  SINGaporE  SiNg.aPoR   singapore  sIngapore'
     '  siNgapore  sinGapore  sINgap.ore  sIngaPore'
     '  sinGaporE  sinGAPore  sINGaporE  siNgaPoRe'
     '    Austria    Aus.trIA    aUSTria    AUSTRiA'
     '  Den L.M.M     Den   Lm.M    DEn Lm.M.'
     '  DEN L.MM      De.n L.M.M.     Den LmM'
     '    L.MM   DEn      LMM DeN     LM.m  Den')

print '\n'.join(res for res in code(s,countries))

Upvotes: 2

Adam Matan

Reputation: 136231

As commented, the Ugly way would be:

>>> re.search("S[iI][Nn][Gg][Aa][Pp][Oo][Rr][Ee]" , "SingaPore")
<_sre.SRE_Match object at 0x10cea84a8>
>>> re.search("S[iI][Nn][Gg][Aa][Pp][Oo][Rr][Ee]" , "Information in sinGapore")

The more elegant way would be matching Singapore case-insensitive, and then checking that the first letter is S:

reg=re.compile("singapore", re.I)

>>> s="Information in sinGapore"
>>> reg.search(s) and reg.search(s).group()[0]=='S'
False

>>> s="Information in SinGapore"
>>> reg.search(s) and reg.search(s).group()[0]=='S'
True

Update

Following your comment - you could use:

reg.search(s).group().startswith("S")

Instead of:

reg.search(s).group()[0]==("S")

If it seems more readable.

Upvotes: 5

PaulMcG

Reputation: 63739

You could write a simple lambda to generate the ugly-but-all-re-solution:

>>> leading_cap_re = lambda s: s[0].upper() + ''.join('[%s%s]' % 
                                                    (c.upper(),c.lower()) 
                                                        for c in s[1:])
>>> leading_cap_re("Singapore")
'S[Ii][Nn][Gg][Aa][Pp][Oo][Rr][Ee]'

For multi-word cities, define a string-splitting version:

>>> leading_caps_re = lambda s : r'\s+'.join(map(leading_cap_re,s.split()))
>>> print leading_caps_re('Kuala Lumpur')
K[Uu][Aa][Ll][Aa]\s+L[Uu][Mm][Pp][Uu][Rr]

Then your code could just be:

if re.search(leading_caps_re("Singapore") , st):
    ...etc...

and the ugliness of the RE would be purely internal.

Upvotes: 2

Vorsprung

Reputation: 34357

interestingly

/((S)((?i)ingapore))/

Does the right thing in perl but doesn't seem to work as needed in python. To be fair the python docs spell it out clearly, (?i) alters the whole regexp

Upvotes: 1

Python RE ( In a word to check first letter is case sensitive and rest all case insensitive)

Answers (5)

EDIT 1

EDIT 2

Update

Related Questions