Reputation: 118
How can I find a word that preceding to [¹²³⁴⁵⁶⁷⁸⁹⁰
]. For ex.:
let myString = "Regular expressions¹ consist of constants, ² and operator symbols...³"
Please, provide a pattern to select characters from start of the target word to superscript:
"expressions¹", "constants, ²", "symbols...³"
& pattern to select only target word
"expressions", "constants", "symbols"
Upvotes: 0
Views: 104
Reputation:
This will match your examples.
Codepoints:
\b\w+\W*[\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
From Wikipedia:
The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at U+2070 to U+209F.
Update:
To get separate blocks that start with words or non-words, you can just
exclude the superscript range from the non-word class.
The regex is longer and more redundant, but it works.
(?:\b\w+[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]*|[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+)[\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
Formatted
(?:
\b
# Required - Words
\w+
# Optional - Not words, nor supersctipt
[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]*
| # or,
# Required - Not words, nor supersctipt
[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
)
# Required - Superscript
[\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
Upvotes: 1
Reputation: 17544
based on sin's or Caleb Kleveter's information
let myString = " expressions¹ consist of 元機經中有關文字排版² and operator symbols³"
let noteIdx = "\u{2070}\u{00b9}\u{00b2}\u{00b3}\u{2074}\u{2075}\u{2076}\u{2077}\u{2078}\u{2079}"
let strs = myString.unicodeScalars.split { (s) -> Bool in
noteIdx.unicodeScalars.contains{ $0 == s }
}
strs.forEach {
print($0)
}
/* prints
expressions
consist of 元機經中有關文字排版
and operator symbols
*/
this is just a torso, you can continue if you want
Upvotes: 1