alvas
alvas

Reputation: 122142

How to pad a character with spaces when it falls between a unicode range?

The goal is to pad a character with spaces when the issubset condition is met, e.g.

[in]:

subset = [chr(ordinal) for ordinal in range(ord(u'\u31c0'), ord(u'\u31ef'))]

text = '这是个小㇈㇋伙子'

[out]:

output_text = '这是个小 ㇈ ㇋ 伙子'

I could do it as such:

def issubset(uchar):
    if u'\u31c0' <= uchar <= u'\u31ef':
        return True
    return False

def pad_space_ifsubset(text):
    output = ""
    for ch in text:
        if issubset(ch):
            output +=  " " + ch + " "
        else:
            output += ch
    return output

text = '这是个小㇈㇋伙子'

pad_space_ifsubset(text)

But is there a simpler way to do this? Perhaps with regex?

Upvotes: 2

Views: 84

Answers (2)

randomir
randomir

Reputation: 18697

You can use re.sub with a range pattern over the codepoints of interest, and a group backreference in the replacement string (\g<0> will substitute the entire substring matched, or in this case, a single character from the range):

import re

def pad_space_ifsubset(text):
    return re.sub(u'[\u31c0-\u31ef]', ' \g<0> ', text)

For example:

>>> text = u'这是个小㇈㇋伙子'
>>> print pad_space_ifsubset(text)
这是个小 ㇈  ㇋ 伙子

Upvotes: 2

scharette
scharette

Reputation: 9987

Well one thing that I see is that your function issubset, in this case, seems useless. If don't absolutely need to create a function you could use this code instead:

def pad_space_ifsubset(text):
    output = ""
    for ch in text:
         if u'\u31c0' <= ch <= u'\u31ef':
            output +=  " " + ch + " "
        else:
            output += ch
     return output

text = '这是个小㇈㇋伙子'

pad_space_ifsubset(text)

For the space padding, you have many choices but this is the one I'd choose:

output += ' %s ' %ch

Note that what you are using is just fine in my opinion. It is a really simple case and your solution for padding spaces is readable.

Upvotes: 0

Related Questions