Reputation: 733
I want to split this utf-16 string in Swift 5
ddd¾̷̱̲͈́͌͠ͰͿΔδόϡϫЍа
delimiter : "¾"
I've tried the following codes
let Arr = "ddd¾̷̱̲͈́͌͠ͰͿΔδόϡϫЍа".split{$0 == "¾"}.map(String.init)
let Arr = "ddd¾̷̱̲͈́͌͠ͰͿΔδόϡϫЍа".components(separatedBy: "¾")
but both failed
Upvotes: 4
Views: 214
Reputation: 30564
I made an extension! This doesn't have the side effect of changing Ѝ
into И
.
let delimiter: Character = "¾" /// the delim
let string = "ddd¾̷̱̲͈́͌͠ͰͿΔδόϡϫЍа"
let arr = string.components(separatedBySpecialCharacter: delimiter)
print(arr) /// ["ddd", "ͰͿΔδόϡϫЍа"]
extension String {
func components(separatedBySpecialCharacter delimiter: Character) -> [String] {
let cleanedString = self.folding(options: .diacriticInsensitive, locale: .current) /// remove all accents and diacritics
let indicesOfDelimiter = cleanedString.indicesOf(string: String(delimiter)) /// get the indices of the full String where the delimiter is
var stringCharacters = Array(self) /// split the full String into an array
for index in indicesOfDelimiter {
stringCharacters[index] = delimiter /// replace all occurrences of the accented delimited with a clean delimiter
}
let delimiterCleanedString = String(stringCharacters) /// make the array of the full String, with cleaned delimiters, back into a String
let separatedComponents = delimiterCleanedString.components(separatedBy: "¾") /// finally get the components
return separatedComponents
}
/// get indices of a String inside a String
/// from https://stackoverflow.com/a/40413665/14351818
func indicesOf(string: String) -> [Int] {
var indices = [Int]()
var searchStartIndex = self.startIndex
while searchStartIndex < self.endIndex,
let range = self.range(of: string, range: searchStartIndex..<self.endIndex),
!range.isEmpty
{
let index = distance(from: self.startIndex, to: range.lowerBound)
indices.append(index)
searchStartIndex = range.upperBound
}
return indices
}
}
Old answer:
The "¾̷̱̲͈́͌͠" inside "ddd¾̷̱̲͈́͌͠ͰͿΔδόϡϫЍа" has a lot of diacritics/zalgo text on it. You can first clean it up like this:
let string = "ddd¾̷̱̲͈́͌͠ͰͿΔδόϡϫЍа"
let cleanedString = string.folding(options: .diacriticInsensitive, locale: .current)
print(cleanedString)
Result:
ddd¾ͰͿΔδοϡϫИа
Now, you can use components(separatedBy: "¾")
on the cleaned string.
let arr = cleanedString.components(separatedBy: "¾")
print(arr)
Result:
["ddd", "ͰͿΔδοϡϫИа"]
Note that this also changes Ѝ
to И
. I will see if there is a better solution.
Upvotes: 1
Reputation: 299663
The Element of String is Character. A Character is an extended grapheme cluster, which means it composes all combining characters. The Character in this String is ¾̷̱̲͈́͌͠
, so when you try to split on ¾
, it's not found.
I believe what you're trying to operate on is UnicodeScalars, which are individual code points. To do that, you need to first call .unicodeScalars
:
let arr = "ddd¾̷̱̲͈́͌͠ͰͿΔδόϡϫЍа".unicodeScalars.split(separator: "¾").map(String.init)
// ["ddd", "̷̱̲͈́͌͠ͰͿΔδόϡϫЍа"]
Note that the string you've posted here is UTF-8, not UTF-16. Swift can't operate directly on UTF-16 literals (you typically store them as Data or [UInt16]
and then convert them to String). I don't believe this changes your question, however.
Upvotes: 4