Reputation: 2506
I'm wondering how I can split a string containing several sentences into an array of the sentences.
I know about the split function but spliting by "."
doesn't suite for all cases.
Is there something like mentioned in this answer
Upvotes: 1
Views: 2512
Reputation: 5115
If you are capable of using Apple's Foundation
then solution could be quite straightforward.
import Foundation
var text = """
Let's split some text into sentences.
The text might include dates like Jan.13, 2020, words like S.A.E and numbers like 2.2 or $9,999.99 as well as emojis like π¨βπ©βπ§βπ¦! How do I split this?
"""
var sentences: [String] = []
text.enumerateSubstrings(in: text.startIndex..., options: [.localized, .bySentences]) { (tag, _, _, _) in
sentences.append(tag ?? "")
}
There are ways do it with pure Swift of course. Here is quick and dirty split:
let simpleText = """
This is a very simple text.
It doesn't include dates, abbreviations, and numbers, but it includes emojis like π¨βπ©βπ§βπ¦! How do I split this?
"""
let sentencesPureSwift = simpleText.split(omittingEmptySubsequences:true) { $0.isPunctuation && !Set("',").contains($0)}
It could be refined with reduce()
.
Upvotes: 5
Reputation: 211
You can use NSLinguisticsTagger to identify SentenceTerminator tokens and then split into an array of strings from there.
I used this code and it worked great.
https://stackoverflow.com/a/57985302/10736184
let text = "My paragraph with weird punctuation like Nov. 17th."
var r = [Range<String.Index>]()
let t = text.linguisticTags(
in: text.startIndex..<text.endIndex,
scheme: NSLinguisticTagScheme.lexicalClass.rawValue,
tokenRanges: &r)
var result = [String]()
let ixs = t.enumerated().filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].lowerBound}
var prev = text.startIndex
for ix in ixs {
let r = prev...ix
result.append(
text[r].trimmingCharacters(
in: NSCharacterSet.whitespaces))
prev = text.index(after: ix)
}
Where result will now be an array of sentence strings. Note that the sentence will have to be terminated with '?', '!', '.', etc to count. If you want to split on newlines as well, or other Lexical Classes, you can add
|| $0.1 == "ParagraphBreak"
after
$0.1 == "SentenceTerminator"
to do that.
Upvotes: 4
Reputation: 7
Try this:-
var myString : NSString = βThis is a testβ
var myWords: NSArray = myString.componentsSeparatedByString(β β)
//myWords is now: ["This", "is", "a", "test"]
Upvotes: -3
Reputation: 1298
Take a look on this link : How to create String split extension with regex in Swift?
it shows how to combine regex and componentsSeparatedByString.
Upvotes: 0