Reputation: 133
How do I extract the transaction receipt datetime with the least bit of noise in my parse rule from the following HTML? (The output I'm looking to get is this: "Transaction Receipt: 04/28/2011 17:03:09")
<FONT COLOR=DARKBLUE>Transaction Receipt </FONT></TH></TR><TR></TR><TR></TR><TR><TD COLSPAN=4 ALIGN=CENTER><FONT SIZE=-1 COLOR=DARKBLUE>04/28/2011 17:03:09</FONT>
The following works but I don't get a good feeling! There is guaranteed to be a datetime following the words Transaction Receipt somewhere (although I wouldn't do a greedy match if I'm doing a grep)
parse d [
thru {<FONT COLOR=DARKBLUE>Transaction Receipt </FONT></TH></TR><TR></TR><TR></TR><TR><TD COLSPAN=4 ALIGN=CENTER><FONT SIZE=-1 COLOR=DARKBLUE>}
copy t to "</FONT>"
]
Upvotes: 0
Views: 265
Reputation: 3718
Timely as per usual: if the format is consistent, you can always try to explicitly match dates:
rule: use [dg tag date value][
tag: use [chars][
chars: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9" " =-"]
["<" opt "/" some chars ">"]
]
date: use [dg mo dy yr tm][
dg: charset "0123456789"
[
copy mo [2 dg "/"] copy dy [2 dg "/"] copy yr 4 dg
" " copy tm [2 dg ":" 2 dg ":" 2 dg]
(value: load rejoin [dy mo yr "/" tm])
]
]
[
some [
"Transaction Receipt" (probe "Transaction Receipt")
| date (probe value)
; everything else
| some " " | tag ; | skip ; will parse the whole doc...
]
]
]
Upvotes: 1
Reputation: 11
This is shorter...
parse d [thru <FONT SIZE=-1 COLOR=DARKBLUE> copy t to </FONT>]
but isn't specifically looking for the datetime pair. And unfortunately REBOL considers the date used an invalid one...
>> 04/28/2011
** Syntax Error: Invalid date -- 04/28/2011
** Near: (line 1) 04/28/2011
so you can't search for it specifically. If the date was 28/04/2011 (and there was a space after the time, though why it's needed for load I'm not sure), the following would work...
parse load d [to date! copy t to </FONT>]
Hmmm. Try this...
t: ""
parse d [
some [
to "<" thru ">" mark: copy text to "<" (if text [append t text]) :mark
]
]
That returns: "Transaction Receipt 04/28/2011 17:03:09"
It works by skipping all the tags, appending any text that's left to t.
Hope that helps!
Upvotes: 1