arnoapp
arnoapp

Reputation: 2506

Swift: Split String into sentences

I'm wondering how I can split a string containing several sentences into an array of the sentences.

I know about the split function but spliting by "." doesn't suite for all cases.

Is there something like mentioned in this answer

Upvotes: 1

Views: 2512

Answers (4)

Paul B
Paul B

Reputation: 5115

If you are capable of using Apple's Foundation then solution could be quite straightforward.

import Foundation

var text = """
    Let's split some text into sentences.
    The text might include dates like Jan.13, 2020, words like S.A.E and numbers like 2.2 or $9,999.99 as well as emojis like πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦! How do I split this?
"""
var sentences: [String] = []
text.enumerateSubstrings(in: text.startIndex..., options: [.localized, .bySentences]) { (tag, _, _, _) in
    sentences.append(tag ?? "")
}

There are ways do it with pure Swift of course. Here is quick and dirty split:

let simpleText = """
This is a very simple text.
It doesn't include dates, abbreviations, and numbers, but it includes emojis like πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦! How do I split this?
"""

let sentencesPureSwift =  simpleText.split(omittingEmptySubsequences:true) {  $0.isPunctuation && !Set("',").contains($0)}

It could be refined with reduce().

Upvotes: 5

techonic
techonic

Reputation: 211

You can use NSLinguisticsTagger to identify SentenceTerminator tokens and then split into an array of strings from there.

I used this code and it worked great.

https://stackoverflow.com/a/57985302/10736184

let text = "My paragraph with weird punctuation like Nov. 17th."
var r = [Range<String.Index>]()
let t = text.linguisticTags(
    in: text.startIndex..<text.endIndex,
    scheme: NSLinguisticTagScheme.lexicalClass.rawValue,
    tokenRanges: &r)
var result = [String]()
let ixs = t.enumerated().filter {
    $0.1 == "SentenceTerminator"
}.map {r[$0.0].lowerBound}
var prev = text.startIndex
for ix in ixs {
    let r = prev...ix
    result.append(
        text[r].trimmingCharacters(
             in: NSCharacterSet.whitespaces))
     prev = text.index(after: ix)
}

Where result will now be an array of sentence strings. Note that the sentence will have to be terminated with '?', '!', '.', etc to count. If you want to split on newlines as well, or other Lexical Classes, you can add

|| $0.1 == "ParagraphBreak"

after

$0.1 == "SentenceTerminator"

to do that.

Upvotes: 4

Sk Rejabul
Sk Rejabul

Reputation: 7

Try this:-

    var myString : NSString = β€œThis is a test”
    var myWords: NSArray = myString.componentsSeparatedByString(β€œ β€œ)
    //myWords is now: ["This", "is", "a", "test"]

Upvotes: -3

jregnauld
jregnauld

Reputation: 1298

Take a look on this link : How to create String split extension with regex in Swift?

it shows how to combine regex and componentsSeparatedByString.

Upvotes: 0

Related Questions