Patrick Jollain
Patrick Jollain

Reputation: 65

NSRegularExpression and repeating pattern

I am writing an app that is receiving message blocks via TCP. A message block is composed of the following:

It sound logical to use NSRegularExpression to extract the messages from the data received, so I ended up with the following code in playground, implementing the processing of a string of data received:

import UIKit

struct Constants {
    static let messageHeaderPattern = "<<:--!!(\\d{6})(.+)"
}

let receivedData = "<<:--!!000010My message"

let regex = try! NSRegularExpression(pattern: Constants.messageHeaderPattern, options: [])  // Define the regular expression
let range = NSMakeRange(0, receivedData.characters.count)                          // Define the range (all the string)
let matches = regex.matchesInString(receivedData, options: [], range: range)       // Get the matches

print("Number of matches: \(matches.count)")

for match in matches {

    let locationOfMessageLength = match.rangeAtIndex(1).location
    let expectedLengthOfMessage = Int(receivedData.substringWithRange(Range(start: receivedData.startIndex.advancedBy(locationOfMessageLength),
        end: receivedData.startIndex.advancedBy(locationOfMessageLength + 6))))

    let locationOfMessage = match.rangeAtIndex(2).location
    let lengthOfMessage = match.rangeAtIndex(2).length
    let data = receivedData.substringWithRange(Range(start: receivedData.startIndex.advancedBy(locationOfMessage),
        end: receivedData.startIndex.advancedBy(locationOfMessage + lengthOfMessage)))

    // data contains "My message"

}

This code works well, but only if there is one message in the string. To make it work for multiple messages, I changed the regular expression:

static let messageHeaderPattern = "(?:<<:--!!(\\d{6})(.+))+"

and the received data:

let receivedData = "<<:--!!000010My message<<:--!!000014Second message"

But there is still only one match, and data contains My message<<:--!!000014Second message.

What is wrong with my regular expression?

Upvotes: 1

Views: 336

Answers (3)

trapper
trapper

Reputation: 11993

The message could even contain <<:--!!\d{6} so I don't think you will be able to do this with regex alone, so the safe solution is.

  1. regex for ^<<:--!!(\d{6}) to extract the length N
  2. substring out N characters starting from the 13th
  3. repeat

If you want to live dangerously and are confident that <<:--!!\d{6} will never occur in the message then this regex will do the trick.

(?<=<<:--!!\d{6})(.*?)(?=<<:--!!\d{6}|$)

Just remember it will mess up if the delimiter occurs inside the string, you should use the method in my first example to be safe.

Upvotes: 1

Aferrercrafter
Aferrercrafter

Reputation: 439

Try filtering the message itself more, so the (.*) not include the second message in it:

"(?:<<:--!!(\\d{6})([a-zA-Z ]+))"

Upvotes: 0

cimarron
cimarron

Reputation: 431

Try using the pattern static let messageHeaderPattern = "<<:--!!(\\d{6})(.+?)(?!<<:--!!)"

Upvotes: 0

Related Questions