mitchkman
mitchkman

Reputation: 6680

Swift extract regex matches

I want to extract substrings from a string that match a regex pattern.

So I'm looking for something like this:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {
   ???
}

So this is what I have:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {

    var regex = NSRegularExpression(pattern: regex, 
        options: nil, error: nil)

    var results = regex.matchesInString(text, 
        options: nil, range: NSMakeRange(0, countElements(text))) 
            as Array<NSTextCheckingResult>

    /// ???

    return ...
}

The problem is, that matchesInString delivers me an array of NSTextCheckingResult, where NSTextCheckingResult.range is of type NSRange.

NSRange is incompatible with Range<String.Index>, so it prevents me of using text.substringWithRange(...)

Any idea how to achieve this simple thing in swift without too many lines of code?

Upvotes: 236

Views: 196811

Answers (16)

Pranav Kasetti
Pranav Kasetti

Reputation: 9915

Update for iOS 16: Regex, RegexBuilder πŸ‘·β€β™€οΈ

Xcode previously supported Regex with Apple's NSRegularExpression. The Swift API was verbose and challenging to use correctly, so Apple released Regex Literal support and RegexBuilder this year. The Regex flavor used by Regex types is the same as NSRegularExpression, i.e. the ICU Unicode specification.

The API has been simplified going forward to tidy up complex String range-based parsing logic in iOS 16 / macOS 13 as well as improve performance.

Another advantage of using literals is that we get compile time errors in case we use invalid RegEx syntax: Cannot parse regular expression... with a clear description of the RegEx error. Enjoy!

RegEx literals in Swift 5.7

func parseLine(_ line: Substring) throws -> MailmapEntry {

    let regex = /\h*([^<#]+?)??\h*<([^>#]+)>\h*(?:#|\Z)/

    guard let match = line.prefixMatch(of: regex) else {
        throw MailmapError.badLine
    }

    return MailmapEntry(name: match.1, email: match.2)
}

We are able to match using:

  1. firstMatch(of:): Returns the first match for the regex within this collection, where the regex is created by the given closure (RegEx literal).

  2. prefixMatch(of:): Returns a match if this string is matched by the given regex at its start.

  3. wholeMatch(of:): Matches a regex in its entirety, where the regex is created by the given closure (RegEx literal).

  4. matches(of:): Returns a collection containing all non-overlapping matches of the regex, created by the given closure (RegEx literal).

I've linked to the docs above. The new RegEx literal syntax has multiple new APIs such as trimmingPrefix(), contains() and more, so I do encourage exploring the docs further for more nuanced use cases.

There is equivalent syntax of the above methods where we call prefixMatch(in:) on the Regex literal itself and pass in the string to search in. I prefer the syntax above however choose whichever you prefer.

Example code:

let aOrB = /[ab]+/

if let stringMatch = try aOrB.firstMatch(in: "The year is 2022; last year was 2021.") {
    print(stringMatch.0)
} else {
    print("No match.")
}
// prints "a"

RegexBuilder in Swift 5.7

RegexBuilder is a new API released by Apple aimed at making RegEx code easier to write in Swift. We can translate the Regex literal /\h*([^<#]+?)??\h*<([^>#]+)>\h*(?:#|\Z)/ from above into a more declarative form using RegexBuilder if we want more readability.

Do note that we can use raw strings in a RegexBuilder and also interleave Regex Literals in the builder if we want to balance readability with conciseness.

import RegexBuilder

let regex = Regex {
    ZeroOrMore(.horizontalWhitespace)
    Optionally {
        Capture(OneOrMore(.noneOf("<#")))
    }
        .repetitionBehavior(.reluctant)
    ZeroOrMore(.horizontalWhitespace)
    "<"
    Capture(OneOrMore(.noneOf(">#")))
    ">"
    ZeroOrMore(.horizontalWhitespace)
    /#|\Z/
}

The RegEx literal /#|\Z/ is equivalent to:

ChoiceOf {
   "#"
   Anchor.endOfSubjectBeforeNewline
}

Composable RegexComponent

RegexBuilder syntax is similar to SwiftUI also in terms of composability because we can reuse RegexComponents within other RegexComponents:

struct MailmapLine: RegexComponent {
    @RegexComponentBuilder
    var regex: Regex<(Substring, Substring?, Substring)> {
        ZeroOrMore(.horizontalWhitespace)
        Optionally {
            Capture(OneOrMore(.noneOf("<#")))
        }
            .repetitionBehavior(.reluctant)
        ZeroOrMore(.horizontalWhitespace)
        "<"
        Capture(OneOrMore(.noneOf(">#")))
        ">"
        ZeroOrMore(.horizontalWhitespace)
        ChoiceOf {
           "#"
            Anchor.endOfSubjectBeforeNewline
        }
    }
}

Source: Some of this code is taken from the WWDC 2022 video "What's new in Swift".

Upvotes: 16

inigo333
inigo333

Reputation: 3396

On iOS 16 there is new syntax that makes this way easier. For example, for anything within brackets in this string

let randomLog = "2493875469750,1678798470864,{latitude: 50, longitude: 43}"

if let match = randomLog.firstMatch(of: /\{.*\}/) {
    print(match.output)
}

This prints

"{"latitude": 50, "longitude": 43}"

In order to become a Swift Regex Pro, or just for more info, have a look at WWDC 2022: https://developer.apple.com/videos/play/wwdc2022/110357/

Upvotes: 2

Mojtaba Hosseini
Mojtaba Hosseini

Reputation: 119128

You can use matching(regex:) on the string like:

let array = try "Your String To Search".matching(regex: ".")

using this simple extension:

public extension String {
    func matching(regex: String) throws -> [String] {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: self, range: NSRange(startIndex..., in: self))
        return results.map { String(self[Range($0.range, in: self)!]) }
    }
}

Upvotes: 1

Dzhek
Dzhek

Reputation: 101

basic phone number matching

let phoneNumbers = ["+79990001101", "+7 (800) 000-11-02", "+34 507 574 147 ", "+1-202-555-0118"]

let match: (String) -> String = {
    $0.replacingOccurrences(of: #"[^\d+]"#, with: "", options: .regularExpression)
}

print(phoneNumbers.map(match))
// ["+79990001101", "+78000001102", "+34507574147", "+12025550118"]

Upvotes: 1

Ken Mueller
Ken Mueller

Reputation: 4163

The fastest way to return all matches and capture groups in Swift 5

extension String {
    func match(_ regex: String) -> [[String]] {
        let nsString = self as NSString
        return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, nsString.length)).map { match in
            (0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
        } ?? []
    }
}

Returns a 2-dimentional array of strings:

"prefix12suffix fix1su".match("fix([0-9]+)su")

returns...

[["fix12su", "12"], ["fix1su", "1"]]

// First element of sub-array is the match
// All subsequent elements are the capture groups

Upvotes: 38

dengST30
dengST30

Reputation: 4037

update @Mike Chirico's to Swift 5

extension String{



  func regex(pattern: String) -> [String]?{
    do {
        let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options(rawValue: 0))
        let all = NSRange(location: 0, length: count)
        var matches = [String]()
        regex.enumerateMatches(in: self, options: NSRegularExpression.MatchingOptions(rawValue: 0), range: all) {
            (result : NSTextCheckingResult?, _, _) in
              if let r = result {
                    let nsstr = self as NSString
                    let result = nsstr.substring(with: r.range) as String
                    matches.append(result)
              }
        }
        return matches
    } catch {
        return nil
    }
  }
}

Upvotes: 2

shiami
shiami

Reputation: 7264

Swift 4 without NSString.

extension String {
    func matches(regex: String) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: [.caseInsensitive]) else { return [] }
        let matches  = regex.matches(in: self, options: [], range: NSMakeRange(0, self.count))
        return matches.map { match in
            return String(self[Range(match.range, in: self)!])
        }
    }
}

Upvotes: 7

Lars Blumberg
Lars Blumberg

Reputation: 21361

My answer builds on top of given answers but makes regex matching more robust by adding additional support:

  • Returns not only matches but returns also all capturing groups for each match (see examples below)
  • Instead of returning an empty array, this solution supports optional matches
  • Avoids do/catch by not printing to the console and makes use of the guard construct
  • Adds matchingStrings as an extension to String

Swift 4.2

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.range(at: $0).location != NSNotFound
                    ? nsString.substring(with: result.range(at: $0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")

Swift 3

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAt($0).location != NSNotFound
                    ? nsString.substring(with: result.rangeAt($0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")

Swift 2

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matchesInString(self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAtIndex($0).location != NSNotFound
                    ? nsString.substringWithRange(result.rangeAtIndex($0))
                    : ""
            }
        }
    }
}

Upvotes: 75

Vasco
Vasco

Reputation: 911

Big thanks to Lars Blumberg his answer for capturing groups and full matches with Swift 4, which helped me out a lot. I also made an addition to it for the people who do want an error.localizedDescription response when their regex is invalid:

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        do {
            let regex = try NSRegularExpression(pattern: regex)
            let nsString = self as NSString
            let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
            return results.map { result in
                (0..<result.numberOfRanges).map {
                    result.range(at: $0).location != NSNotFound
                        ? nsString.substring(with: result.range(at: $0))
                        : ""
                }
            }
        } catch let error {
            print("invalid regex: \(error.localizedDescription)")
            return []
        }
    }
}

For me having the localizedDescription as error helped understand what went wrong with escaping, since it's displays which final regex swift tries to implement.

Upvotes: 1

valexa
valexa

Reputation: 4503

Most of the solutions above only give the full match as a result ignoring the capture groups e.g.: ^\d+\s+(\d+)

To get the capture group matches as expected you need something like (Swift4) :

public extension String {
    public func capturedGroups(withRegex pattern: String) -> [String] {
        var results = [String]()

        var regex: NSRegularExpression
        do {
            regex = try NSRegularExpression(pattern: pattern, options: [])
        } catch {
            return results
        }
        let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.count))

        guard let match = matches.first else { return results }

        let lastRangeIndex = match.numberOfRanges - 1
        guard lastRangeIndex >= 1 else { return results }

        for i in 1...lastRangeIndex {
            let capturedGroupIndex = match.range(at: i)
            let matchedString = (self as NSString).substring(with: capturedGroupIndex)
            results.append(matchedString)
        }

        return results
    }
}

Upvotes: 4

Martin R
Martin R

Reputation: 539685

Even if the matchesInString() method takes a String as the first argument, it works internally with NSString, and the range parameter must be given using the NSString length and not as the Swift string length. Otherwise it will fail for "extended grapheme clusters" such as "flags".

As of Swift 4 (Xcode 9), the Swift standard library provides functions to convert between Range<String.Index> and NSRange.

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return results.map {
            String(text[Range($0.range, in: text)!])
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "πŸ‡©πŸ‡ͺ€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]

Note: The forced unwrap Range($0.range, in: text)! is safe because the NSRange refers to a substring of the given string text. However, if you want to avoid it then use

        return results.flatMap {
            Range($0.range, in: text).map { String(text[$0]) }
        }

instead.


(Older answer for Swift 3 and earlier:)

So you should convert the given Swift string to an NSString and then extract the ranges. The result will be converted to a Swift string array automatically.

(The code for Swift 1.2 can be found in the edit history.)

Swift 2 (Xcode 7.3.1) :

func matchesForRegexInText(regex: String, text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text,
                                            options: [], range: NSMakeRange(0, nsString.length))
        return results.map { nsString.substringWithRange($0.range)}
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "πŸ‡©πŸ‡ͺ€4€9"
let matches = matchesForRegexInText("[0-9]", text: string)
print(matches)
// ["4", "9"]

Swift 3 (Xcode 8)

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range)}
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "πŸ‡©πŸ‡ͺ€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]

Upvotes: 387

Jorge Osorio
Jorge Osorio

Reputation: 21

This is a very simple solution that returns an array of string with the matches

Swift 3.

internal func stringsMatching(regularExpressionPattern: String, options: NSRegularExpression.Options = []) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regularExpressionPattern, options: options) else {
            return []
        }

        let nsString = self as NSString
        let results = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))

        return results.map {
            nsString.substring(with: $0.range)
        }
    }

Upvotes: 2

Rob Mecham
Rob Mecham

Reputation: 613

I found that the accepted answer's solution unfortunately does not compile on Swift 3 for Linux. Here's a modified version, then, that does:

import Foundation

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

The main differences are:

  1. Swift on Linux seems to require dropping the NS prefix on Foundation objects for which there is no Swift-native equivalent. (See Swift evolution proposal #86.)

  2. Swift on Linux also requires specifying the options arguments for both the RegularExpression initialization and the matches method.

  3. For some reason, coercing a String into an NSString doesn't work in Swift on Linux but initializing a new NSString with a String as the source does work.

This version also works with Swift 3 on macOS / Xcode with the sole exception that you must use the name NSRegularExpression instead of RegularExpression.

Upvotes: 12

OliverD
OliverD

Reputation: 1204

@p4bloch if you want to capture results from a series of capture parentheses, then you need to use the rangeAtIndex(index) method of NSTextCheckingResult, instead of range. Here's @MartinR 's method for Swift2 from above, adapted for capture parentheses. In the array that is returned, the first result [0] is the entire capture, and then individual capture groups begin from [1]. I commented out the map operation (so it's easier to see what I changed) and replaced it with nested loops.

func matches(for regex: String!, in text: String!) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text, options: [], range: NSMakeRange(0, nsString.length))
        var match = [String]()
        for result in results {
            for i in 0..<result.numberOfRanges {
                match.append(nsString.substringWithRange( result.rangeAtIndex(i) ))
            }
        }
        return match
        //return results.map { nsString.substringWithRange( $0.range )} //rangeAtIndex(0)
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

An example use case might be, say you want to split a string of title year eg "Finding Dory 2016" you could do this:

print ( matches(for: "^(.+)\\s(\\d{4})" , in: "Finding Dory 2016"))
// ["Finding Dory 2016", "Finding Dory", "2016"]

Upvotes: 5

Mike Chirico
Mike Chirico

Reputation: 3491

If you want to extract substrings from a String, not just the position, (but the actual String including emojis). Then, the following maybe a simpler solution.

extension String {
  func regex (pattern: String) -> [String] {
    do {
      let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
      let nsstr = self as NSString
      let all = NSRange(location: 0, length: nsstr.length)
      var matches : [String] = [String]()
      regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
        (result : NSTextCheckingResult?, _, _) in
        if let r = result {
          let result = nsstr.substringWithRange(r.range) as String
          matches.append(result)
        }
      }
      return matches
    } catch {
      return [String]()
    }
  }
} 

Example Usage:

"someText πŸ‘ΏπŸ…πŸ‘Ώβš½οΈ pig".regex("πŸ‘Ώβš½οΈ")

Will return the following:

["πŸ‘Ώβš½οΈ"]

Note using "\w+" may produce an unexpected ""

"someText πŸ‘ΏπŸ…πŸ‘Ώβš½οΈ pig".regex("\\w+")

Will return this String array

["someText", "️", "pig"]

Upvotes: 15

Dalorzo
Dalorzo

Reputation: 20014

This is how I did it, I hope it brings a new perspective how this works on Swift.

In this example below I will get the any string between []

var sample = "this is an [hello] amazing [world]"

var regex = NSRegularExpression(pattern: "\\[.+?\\]"
, options: NSRegularExpressionOptions.CaseInsensitive 
, error: nil)

var matches = regex?.matchesInString(sample, options: nil
, range: NSMakeRange(0, countElements(sample))) as Array<NSTextCheckingResult>

for match in matches {
   let r = (sample as NSString).substringWithRange(match.range)//cast to NSString is required to match range format.
    println("found= \(r)")
}

Upvotes: 2

Related Questions