Reputation: 33607
First I'll state my intention: I'm trying to write a parse rule for potentially obfuscated 10-digit phone numbers. So imagine cases like "callmeNOW...555___555____5555!"
The place I thought to start from is Wikipedia's list of valid area codes. Then I transform these into pipe-delimited STRING! to use as a parse rule, which I seek up TO
.
After that I thought I'd try just reading the next 10 digits. But a proof-of-concept I expected to work did not:
>> digit: charset "0123456789"
== make bitset! #{000000000000FFC0}
>> parse "callmeNOW...555___555____5555!" [10 [thru digit] to end]
== false
Leading me to a simpler test which failed to do what I wanted:
>> parse "5" [thru digit]
== false
If I don't use it with THRU, then it does the expected thing:
>> parse "5" [digit]
== true
Why would bitsets be supported when used as tokens but not with TO
or THRU
?
P.S. here's the list of area codes as a Rebol block if anyone wants it...
valid-area-codes: [
205 251 256 334 659 938 907 250 480 520 602 623 928 327 479 501 870
209 213 310 323 341 369 408 415 424 442 510 530 559 562 619 626 627
628 650 657 661 669 707 714 747 760 764 805 818 831 858 909 916 925
935 949 951 303 719 720 970 203 475 860 959 302 202 239 305 321 352
386 407 561 689 727 754 772 786 813 850 863 904 941 954 229 404 470
478 678 706 762 770 912 808 208 217 224 309 312 331 447 464 618 630
708 730 773 779 815 847 872 219 260 317 574 765 812 319 515 563 641
712 316 620 785 913 270 364 502 606 859 225 318 337 504 985 207 227
240 301 410 443 667 339 351 413 508 617 774 781 857 978 231 248 269
313 517 586 616 679 734 810 906 947 989 218 320 507 612 651 763 952
228 601 662 769 314 417 557 573 636 660 816 975 406 308 402 531 702
775 603 201 551 609 732 848 856 862 908 973 505 575 212 315 347 516
518 585 607 631 646 716 718 845 914 917 929 252 336 704 828 910 919
980 984 701 216 234 283 330 380 419 440 513 567 614 740 937 405 539
580 918 458 503 541 971 215 267 272 412 445 484 570 582 610 717 724
814 835 878 401 803 843 864 605 423 615 731 865 901 931 210 214 254
281 325 361 409 430 432 469 512 682 713 737 806 817 830 832 903 915
936 940 956 972 979 385 435 801 802 276 434 540 571 703 757 804 206
253 360 425 509 564 304 681 262 274 414 534 608 715 920 307
]
Upvotes: 2
Views: 90
Reputation: 1503
the real answer is that TO and THRU are not pattern matching operators, but search operators.
Charsets are used specifically for pattern-matching and as such they are a bitmask, so you can't search them, they don't exist in the stream, they filter & compare it, one byte at a time. This doesn't mean they coudn't be made to be used for TO/THRU in the future, just why they wheren't meant to be used as such thus far.
the reason your "pipe rule" example works (its in fact a list of alternatives using the OR ("|") operation) is that TO/THRU will search for each alternative in the given order, until one is matched (so ordering should change speed, in long list of alternatives).
the following rule would extract all relatively well-formed phone numbers allowing for some "wiggle room" in form (R2 & R3):
digit: charset "0123456789"
!digit: complement digit
parse/all text [
any [
copy phone [ opt "(" 3 digit 1 3 !digit 3 digit 1 3 !digit 4 digit] (print phone)
| skip
]
]
note that the above is a quick and dirty number extractor for any North-American phone. It will scan the whole text, printing any number it finds.
Upvotes: 1
Reputation: 3718
Of course, while R3 does move parse
along a bit, the usual pattern is to create a complementary bitset to skip content outwith the desired set.
filler: complement digit: charset "0123456789"
parse/all stream [any [copy n some digit (append out: [] n) | some filler]]
probe out ; all the digit chunks in the stream
Upvotes: 1
Reputation: 33607
Not succinct, but if I don't construct my digit set as a BITSET! but rather with a pipe rule, it works fine:
>> digit: ["0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"]
== ["0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"]
>> parse "callmeNOW...555___555____5555!" [10 [thru digit] to end]
== true
Upvotes: 0