Reputation: 271775
In the WWDC videos, it was shown that you can do something like this with Capture
s/TryCapture
s in the Regex Builder:
let regex = Regex {
// ...
TryCapture {
OneOrMore(.digit)
} transform: {
Int($0)
}
// ...
}
And the output of the Regex
will be type safe. The Regex
will output an Int
for that group, instead of a Substring
like it normally does.
However, what I would like to do is to change the entire output type of the whole Regex
, like applying a transform:
at the end of the Regex
closure. For example, to parse a line containing the name, age and date of birth of a person:
John (30) 1992-09-22
I would like to do something like:
// this doesn't work and is just for illustration - there is no such Regex.init
let regex = Regex {
Capture(/\w+/)
" ("
TryCapture(/\d+/) { Int($0) }
") "
Capture(.iso8601Date(timeZone: .gmt))
} transform: { (_, name, age, dob) in
Person(name: String(name), age: age, dob: dob)
}
And I would expect regex
be of type Regex<Person>
, and not Regex<(Substring, Substring, Int, Date)>
. That is, someString.wholeMatch(of: regex).output
would be a string, not a tuple.
I'm basically just trying to reduce the occurrence of tuples, because I find it very inconvenient to work with them, especially unnamed ones. Since RegexComponent
is parameterised by the unconstrained RegexOutput
type, and there are built-in types where RegexOutput
is Date
and Decimal
, surely doing this for arbitrary types using regex is not impossible, right?
My attempt was:
struct Person {
let name: String
let age: Int
let dob: Date
}
let line = "John (30) 1992-09-22"
let regex = Regex {
Capture {
Capture(/\w+/)
" ("
TryCapture(/\d+/) { Int($0) }
") "
Capture(.iso8601Date(timeZone: .gmt))
} transform: { (_, name, age, dob) in
Person(name: String(name), age: age, dob: dob)
}
}
line.wholeMatch(of: regex)
but this crashed at runtime, giving the message:
Could not cast value of type 'Swift.Substring' (0x7ff865e3ead8) to '(Swift.Substring, Swift.Substring, Swift.Int, Foundation.Date)' (0x7ff863f2e660).
Another attempt of mine using CustomConsumingRegexComponent
is shown here in this answer, but that has quite a large caveat, namely that it doesn't backtrack properly.
How can I create a Regex
that outputs my own type?
Upvotes: 3
Views: 969
Reputation: 4204
From what I have read/seen in samples (e.g. swift-regex), it might be a good idea to create a regex component similar to .word
, .digit
, but nesting captures
does not seem to work easily.
Here is an example run in the playground to create a Person struct
instance:
public static func regexBuilderMatching(string: String = "John (30) 1992-09-22") {
struct Person: CustomStringConvertible {
let name: String
let age: Int
let dob: Date
public func dobToFormatterString() -> String {
let dateFormatter = DateFormatter()
// 1992-09-22 04:00:00 +0000
dateFormatter.dateFormat = "yyyy-MM-dd"
return dateFormatter.string(from: self.dob)
}
var description: String {
return "\(name), age: \(age), has dob: \(dobToFormatterString())"
}
}
func dateFromString(dateString: String) -> Date? {
let formatter = DateFormatter()
formatter.timeStyle = .none // removes time from date
formatter.dateStyle = .full
formatter.dateFormat = "y-MM-d" // 1992-09-22
return formatter.date(from: dateString)
}
let regexWithBasicCapture = Regex {
/* 1. */ Capture { OneOrMore(.word) }
/* 2. */ " ("
/* 3. */ TryCapture { OneOrMore(.digit) }
transform: { match in
Int(match)
}
/* 4. */ ") "
/* 5. */ TryCapture { OneOrMore(.iso8601Date(timeZone: .gmt)) }
transform: { match in
dateFromString(dateString: String(match))
}
}
let matches = string.matches(of: regexWithBasicCapture)
for match in matches {
// shorthand syntax using match output
// https://developer.apple.com/documentation/swift/regex/match
let (_, name, age, date) = match.output
let person = Person(name: String(name), age: age, dob: date)
print(person)
}
}
The above code will output:
John, age: 30, has dob: 1992-09-22
Upvotes: 1