Reputation: 1526
I am trying to parse a string with a regex, I am getting some problems trying to extract all the information in substrings. I am almost done, but I am stacked at this point:
For a string like this:
[00/0/00, 00:00:00] User: This is the message text and any other stuff
I can parse Date, User and Message in Swift with this code:
let line = "[00/0/00, 00:00:00] User: This is the message text and any other stuff"
let result = line.match("(.+)\\s([\\S ]*):\\s(.*\n(?:[^-]*)*|.*)$")
extension String {
func match(_ regex: String) -> [[String]] {
let nsString = self as NSString
return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, count)).map { match in
(0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
} ?? []
}
}
The resulting array is something like this:
[["[00/0/00, 00:00:00] User: This is the message text and any other stuff","[00/0/00, 00:00:00]","User","This is the message text and any other stuff"]]
Now my problem is this, if the message has a ':'
on it, the resulting array is not following the same format and breaks the parsing function.
So I think I am missing some cases in the regex, can anyone help me with this? Thanks in advance.
Upvotes: 1
Views: 664
Reputation: 163632
In the pattern, you are making use of parts that are very broad matches.
For example, .+
will first match until the end of the line, [\\S ]*
will match either a non whitespace char or a space and [^-]*
matches any char except a -
The reason it could potentially break is that the broad matches first match until the end of the string. As a single :
is mandatory in your pattern, it will backtrack from the end of the string until it can match a :
followed by a whitespace, and then tries to match the rest of the pattern.
Adding another :
in the message part, may cause the backtracking to stop earlier than you would expect making the message group shorter.
You could make the pattern a bit more precise, so that the last part can also contain :
without breaking the groups.
(\[[^][]*\])\s([^:]*):\s(.*)$
(\[[^][]*\])
Match the part from an opening till closing square bracket [...]
in group 1\s
Match a whitespace char([^:]*):
Match any char except :
in group 2, then match the expected :
\s(.*)
Match a whitespace char, and capture 0+ times any char in group 3$
End of stringUpvotes: 2