Gaurang Tandon
Gaurang Tandon

Reputation: 6781

Replace in string based on function ouput

So, for input:

accessibility,random good bye

I want output:

a11y,r4m g2d bye

So, basically, I have to abbreviate all words of length greater than or equal to 4 in the following format: first_letter + length_of_all_letters_in_between + last_letter

I try to do this:

re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", r"\1" + str(len(r"\2")) + r"\3", s)

But it does not work. In JS, I would easily do:

str.replace(/([A-Za-z])([A-Za-z]{2,})([A-Za-z])/g, function(m, $1, $2, $3){
   return $1 + $2.length + $3;
});

How do I do the same in Python?

EDIT: I cannot afford to lose any punctuation present in original string.

Upvotes: 8

Views: 1576

Answers (7)

Kasravnd
Kasravnd

Reputation: 107357

As an alternative precise way you can use a separate function for re.sub and use the simple regex r"(\b[a-zA-Z]+\b)".

>>> def replacer(x): 
...    g=x.group(0)
...    if len(g)>3:
...        return '{}{}{}'.format(g[0],len(g)-2,g[-1])
...    else :
...        return g
... 
>>> re.sub(r"(\b[a-zA-Z]+\b)", replacer, s)
'a11y,r4m g2d bye'

Also as a pythonic and general way, to get the replaced words within a list you can use a list comprehension using re.finditer :

>>> from operator import sub
>>> rep=['{}{}{}'.format(i.group(0)[0],abs(sub(*i.span()))-2,i.group(0)[-1]) if len(i.group(0))>3 else i.group(0) for i in re.finditer(r'(\w+)',s)]
>>> rep
['a11y', 'r4m', 'g2d', 'bye']

The re.finditer will returns a generator contains all matchobjects then you can iterate over it and get the start and end of matchobjects with span() method.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174874

Keep it simple...

>>> s = "accessibility,random good bye"
>>> re.sub(r'\B[A-Za-z]{2,}\B', lambda x: str(len(x.group())), s)
'a11y,r4m g2d bye'

\B which matches between two word characters or two non-word chars helps to match all the chars except first and last.

Upvotes: 1

perreal
perreal

Reputation: 98118

Using regex and comprehension:

import re
s = "accessibility,random good bye"
print "".join(w[0]+str(len(w)-2)+w[-1] if len(w) > 3 else w for w in re.split("(\W)", s))

Gives:

a11y,r4m g2d bye

Upvotes: 0

Padraic Cunningham
Padraic Cunningham

Reputation: 180540

tmp, out = "",""
for ch in s:
    if ch.isspace() or ch in {",", "."}:
        out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
        tmp = ""
    else:
        tmp += ch
out += "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(out)

a11y,r4m g2d bye

If you only want alpha characters use str.isalpha:

tmp, out = "", ""
for ch in s:
    if not ch.isalpha():
        out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
        tmp = ""
    else:
        tmp += ch
out += "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(out)
a11y,r4m g2d bye

The logic is the same for both, it is just what we check for that differs, if not ch.isalpha() is False we found a non alpha character so we need to process the tmp string and add it to out output string. if len(tmp) is not greater than 3 as per the requirement we just add the tmp string plus the current char to our out string.

We need a final out += "{}{}{} outside the loop to catch when a string does not end in a comma, space etc.. If the string did end in a non-alpha we would be adding an empty string so it would make no difference to the output.

It will preserve punctuation and spaces:

 s = "accessibility,random   good bye !!    foobar?"
def func(s):
    tmp, out = "", ""
    for ch in s:
        if not ch.isalpha():
            out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
            tmp = ""
        else:
            tmp += ch
    return "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(func(s,3))
a11y,r4m   g2d bye !!    f4r?

Upvotes: 2

Blckknght
Blckknght

Reputation: 104852

The issue you're running into is that len(r'\2') is always 2, not the length of the second capturing group in your regular expression. You can use a lambda expression to create a function that works just like the code you would use in JavaScript:

re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])",
       lambda m: m.group(1) + str(len(m.group(2)) + m.group(3),
       s)

The m argument to the lambda is a match object, and the calls to its group method are equivalent to the backreferences you were using before.

It might be easier to just use a simple word matching pattern with no capturing groups (group() can still be called with no argument to get the whole matched text):

re.sub(r'\w{4,}', lambda m: m.group()[0] + str(len(m.group())-2) + m.group()[-1], s)

Upvotes: 3

Cu3PO42
Cu3PO42

Reputation: 1473

What you are doing in JavaScript is certainly right, you are passing an anonymous function. What you do in Python is to pass a constant expression ("\12\3", since len(r"\2") is evaluated before the function call), it is not a function that can be evaluated for each match!

While anonymous functions in Python aren't quite as useful as they are in JS, they do the job here:

>>> import re
>>> re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", lambda m: "{}{}{}".format(m.group(1), len(m.group(2)), m.group(3)), "accessability, random good bye")
'a11y, r4m g2d bye'

What happens here is that the lambda is called for each substitution, taking a match object. I then retrieve the needed information and build a substitution string from that.

Upvotes: 8

pythondetective
pythondetective

Reputation: 336

Have a look at the following code

sentence = "accessibility,random good bye"
sentence = sentence.replace(',', " ")
sentence_list = sentence.split(" ")
for item in sentence_list:
    if len(item) >= 4:
        print item[0]+str(len(item[1:len(item)-1]))+item[len(item)-1]

The only thing you should take care of comma and other punctuation characters.

Upvotes: -1

Related Questions