Reputation: 375
I've already asked this question, but it was for Ruby, now it's Python's turn! I want to sort the words of a string, keeping non-alphanumeric characters in place, for example:
"hello, sally! seen 10/dec/2016 => ehllo, allsy! eens 01/cde/0126"
Based in the answer I've previously received, I've tried to do:
def sortThisList(listWords):
for word in listWords:
print(re.sub('\W+', sortStr(word), word)) #Error
def sortStr(word):
return "".join(sorted(list(word)))
But this error pops up:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in sortItAll
File ".../lib/python3.6/re.py", line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
Not anymore, thank you ^^. But it's still not sorting properly.
Upvotes: 0
Views: 290
Reputation: 1121524
You are trying to apply the regular expression to the whole list, not the individual word:
for word in textInaList: # textInaList presumably is a list
print(re.sub('\W+', sortStr(word), textInaList))
# you pass that list into re.sub(): ^^^^^^^^^^^
Next, you want to pass in a your sortStr
function if you want it to be used for each replacement, and have that function deal with a match object. You'll also want to replace \w+
(word characters), not non-word characters:
def sortStr(match):
return "".join(sorted(match.group()))
print(re.sub('\w+', sortStr, sentence))
When you pass in a function as the second argument to re.sub()
, it is called for every match found in the third argument, passing in a match object; calling match.group()
returns the matched text (so a single word in this case). The return value is then used as the replacement.
Demo:
>>> import re
>>> def sortStr(match):
... return "".join(sorted(match.group()))
...
>>> sentence = "hello, sally! seen 10/dec/2016"
>>> re.sub('\w+', sortStr, sentence)
'ehllo, allsy! eens 01/cde/0126'
Upvotes: 3
Reputation: 4277
It makes more sense to match consecutive alphanumeric characters, sort them and replace the original words while keeping all other characters untouched. That is:
In [25]: s = "hello, sally! seen 10/dec/2016"
In [26]: ns = s
In [27]: for w in re.findall(r'\w+', s):
...: ns = ns.replace(w, "".join(sorted(w)))
...:
In [28]: ns
Out[28]: 'ehllo, allsy! eens 01/cde/0126'
Upvotes: 1