Reputation: 27
I have some strings have uses subscript and superscript.
Is there anyway i can remove them while keeping my string?
Here is an example, ¹ºUnless otherwise indicated
. How can i remove the superscript of ¹º
?
Thanks in advance!
Upvotes: 2
Views: 3051
Reputation: 3855
The ordinal values of ASCII characters (subscript/superscript characters are not in the ASCII table) are in the range(128)
. Note that range(128)
excludes the upper bound (and when a lower bound is not provided, 0 is assumed to be the lower bound) of the range, so this maps to all of the numbers from 0-127. So, you can strip out any characters which are not in this range:
>>> x = '¹ºUnless otherwise indicated'
>>> y = ''.join([i for i in x if ord(i) < 128])
>>> y
'Unless otherwise indicated'
This iterates over all of the characters of x
, excludes any which are not in the ASCII range, and then joins the resulting list
of characters back into a str
Upvotes: 2
Reputation: 7231
The only sure way you can do is to enumerate all superscript and subscript symbols that might occur and remove the characters that match this set.
If your string is not so weird, you may try to identify for "letter other" and "number other" categories, which would cover other characters in addition to super- and subscripts. Such as this:
import unicodedata
s = "¹ºUnless otherwise indicated"
cleaned = "".join(c for c in s if unicodedata.category(c) not in ["No", "Lo"])
Upvotes: 2