Diego Jiménez
Diego Jiménez

Reputation: 1526

Regular expression in Swift

I am trying to parse a string with a regex, I am getting some problems trying to extract all the information in substrings. I am almost done, but I am stacked at this point:

For a string like this:

[00/0/00, 00:00:00] User: This is the message text and any other stuff

I can parse Date, User and Message in Swift with this code:

let line = "[00/0/00, 00:00:00] User: This is the message text and any other stuff"
let result = line.match("(.+)\\s([\\S ]*):\\s(.*\n(?:[^-]*)*|.*)$")
extension String {
    func match(_ regex: String) -> [[String]] {
        let nsString = self as NSString
        return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, count)).map { match in
            (0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
        } ?? []
    }
}

The resulting array is something like this:

[["[00/0/00, 00:00:00] User: This is the message text and any other stuff","[00/0/00, 00:00:00]","User","This is the message text and any other stuff"]]

Now my problem is this, if the message has a ':' on it, the resulting array is not following the same format and breaks the parsing function.

So I think I am missing some cases in the regex, can anyone help me with this? Thanks in advance.

Upvotes: 1

Views: 664

Answers (1)

The fourth bird
The fourth bird

Reputation: 163632

In the pattern, you are making use of parts that are very broad matches.

For example, .+ will first match until the end of the line, [\\S ]* will match either a non whitespace char or a space and [^-]* matches any char except a -

The reason it could potentially break is that the broad matches first match until the end of the string. As a single : is mandatory in your pattern, it will backtrack from the end of the string until it can match a : followed by a whitespace, and then tries to match the rest of the pattern.

Adding another : in the message part, may cause the backtracking to stop earlier than you would expect making the message group shorter.


You could make the pattern a bit more precise, so that the last part can also contain : without breaking the groups.

 (\[[^][]*\])\s([^:]*):\s(.*)$
  • (\[[^][]*\]) Match the part from an opening till closing square bracket [...] in group 1
  • \s Match a whitespace char
  • ([^:]*): Match any char except : in group 2, then match the expected :
  • \s(.*) Match a whitespace char, and capture 0+ times any char in group 3
  • $ End of string

Regex demo

Upvotes: 2

Related Questions