Reputation: 122142
The goal is to pad a character with spaces when the issubset
condition is met, e.g.
[in]:
subset = [chr(ordinal) for ordinal in range(ord(u'\u31c0'), ord(u'\u31ef'))]
text = '这是个小㇈㇋伙子'
[out]:
output_text = '这是个小 ㇈ ㇋ 伙子'
I could do it as such:
def issubset(uchar):
if u'\u31c0' <= uchar <= u'\u31ef':
return True
return False
def pad_space_ifsubset(text):
output = ""
for ch in text:
if issubset(ch):
output += " " + ch + " "
else:
output += ch
return output
text = '这是个小㇈㇋伙子'
pad_space_ifsubset(text)
But is there a simpler way to do this? Perhaps with regex?
Upvotes: 2
Views: 84
Reputation: 18697
You can use re.sub
with a range pattern over the codepoints of interest, and a group backreference in the replacement string (\g<0>
will substitute the entire substring matched, or in this case, a single character from the range):
import re
def pad_space_ifsubset(text):
return re.sub(u'[\u31c0-\u31ef]', ' \g<0> ', text)
For example:
>>> text = u'这是个小㇈㇋伙子'
>>> print pad_space_ifsubset(text)
这是个小 ㇈ ㇋ 伙子
Upvotes: 2
Reputation: 9987
Well one thing that I see is that your function issubset
, in this case, seems useless. If don't absolutely need to create a function you could use this code instead:
def pad_space_ifsubset(text):
output = ""
for ch in text:
if u'\u31c0' <= ch <= u'\u31ef':
output += " " + ch + " "
else:
output += ch
return output
text = '这是个小㇈㇋伙子'
pad_space_ifsubset(text)
For the space padding, you have many choices but this is the one I'd choose:
output += ' %s ' %ch
Note that what you are using is just fine in my opinion. It is a really simple case and your solution for padding spaces is readable.
Upvotes: 0