Will
Will

Reputation: 5490

Regex not matching string correctly

Wondering if you good help me with this problem.

I am trying to match word after a pattern. In this case I'd like to match the word immediately following r/.

Example:

"This is an /r/example for /r/stackoverflow" would match both "example" and "stackoverflow".

I am testing this with https://regex101.com/ where the pattern (?=r\/.*?(\w+)) works well.

However, in Swift it fails, returning an array of empty strings equal in length to the occurrences of r/ in a given string.

The implementation is as follows:

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

print(matches(for: "(?=r\\/.*?(\\w+))", in: "This is an /r/example for /r/stackoverflow"))

The above code prints ["", ""]

Why is Swift failing here?

Thanks

Upvotes: 1

Views: 339

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627370

The (?=r\/.*?(\w+)) pattern is a positive lookahead that tests each location inside a string and returns a match whenever that position is followed with r/, any 0+ chars other than line break chars as few as possible and then 1+ word chars that are captured into Group 1. You may actually grab the value if you access Group 1, but you may still keep using your code if you just change the pattern to

"(?<=\\br/)\\w+"

The pattern will check for the r/ (as a whole word) and will only return 1+ word chars that stand to the right of this char sequence.

See the online regex demo.

Pattern details

  • (?<=\br/) - a positive lookbehind that requires the presence of a
    • \b - word boundary, followed with
    • r/ - an r/ char sequence ... immediately to the left of the current location
  • \w+ - 1 or more word chars (letters, digits or _ char).

Upvotes: 3

Related Questions