libertefudgy
libertefudgy

Reputation: 27

How do i remove subscript/superscript in python

I have some strings have uses subscript and superscript.

Is there anyway i can remove them while keeping my string?

Here is an example, ¹ºUnless otherwise indicated. How can i remove the superscript of ¹º?

Thanks in advance!

Upvotes: 2

Views: 3051

Answers (2)

awarrier99
awarrier99

Reputation: 3855

The ordinal values of ASCII characters (subscript/superscript characters are not in the ASCII table) are in the range(128). Note that range(128) excludes the upper bound (and when a lower bound is not provided, 0 is assumed to be the lower bound) of the range, so this maps to all of the numbers from 0-127. So, you can strip out any characters which are not in this range:

>>> x = '¹ºUnless otherwise indicated'
>>> y = ''.join([i for i in x if ord(i) < 128])
>>> y
'Unless otherwise indicated'

This iterates over all of the characters of x, excludes any which are not in the ASCII range, and then joins the resulting list of characters back into a str

Upvotes: 2

adrtam
adrtam

Reputation: 7231

The only sure way you can do is to enumerate all superscript and subscript symbols that might occur and remove the characters that match this set.

If your string is not so weird, you may try to identify for "letter other" and "number other" categories, which would cover other characters in addition to super- and subscripts. Such as this:

import unicodedata
s = "¹ºUnless otherwise indicated"
cleaned = "".join(c for c in s if unicodedata.category(c) not in ["No", "Lo"])

Upvotes: 2

Related Questions