M. Lanza
M. Lanza

Reputation: 6790

How to get all Regex matches without regard to groups in Ruby?

I wrote a Ruby script that retrieves a webpage via open-uri and runs a Regex against it to locate bible verses found on the page. When I run the Chrome Regex Search plugin using the Regex, the verses are highlighted just as I expect. When I run this in Ruby, not all the verses are selected. I'm pretty sure the issue has to do with the fact that I'm getting submatches based on regex groups when I use I use scan to get all the matches. How can I ensure that the only matches I get are the ones that match the regex in its entirety? I don't care about the submatches based on groups.

For example "John 3:16" is the significant match not its parts "John", "3", "3:16", etc. which result from using groups.

Here's the pertinent code:

rx = Regex.new("(Genesis|Gen|Ge|Gn|Exodus|Exo|Ex|Exod|Leviticus|Lev|Le|Lv|Numbers|Num|Nu|Nm|Nb|Deuteronomy|Deut|Dt|Joshua|Josh|Jos|Jsh|Judges|Judg|Jdg|Jg|Jdgs|Ruth|Rth|Ru|Ezra|Ezr|Ez|Nehemiah|Neh|Ne|Esther|Esth|Es|Job|Jb|Psalm|Pslm|Ps|Psalms|Psa|Psm|Pss|Proverbs|Prov|Pr|Prv|Ecclesiastes|Eccles|Ec|Song of Solomon|Song|So|Song of Songs|SOS|Isaiah|Isa|Is|Jeremiah|Jer|Je|Jr|Lamentations|Lam|La|Ezekiel|Ezek|Eze|Ezk|Daniel|Dan|Da|Dn|Hosea|Hos|Ho|Joel|Joel|Joe|Jl|Amos|Amo|Am|Obadiah|Obad|Ob|Jonah|Jnh|Jon|Micah|Micah|Mic|Nahum|Nah|Na|Habakkuk|Hab|Zephaniah|Zeph|Zep|Zp|Haggai|Hag|Hg|Zechariah|Zech|Zec|Zc|Malachi|Mal|Ml|Ecclesiastes|Eccl|Ecc|Ec|Jeremiah|Jer|Matthew|Matt|Mt|Mark|Mrk|Mk|Mr|Luke|Luk|Lk|Lu|Acts|Act|Ac|Romans|Rom|Ro|Rm|Galatians|Gal|Ga|Ephesians|Ephes|Eph|Philippians|Phil|Php|Colossians|Col|Titus|Tit|Philemon|Philem|Phm|Phi|Hebrews|Heb|James|Jas|Jm|Ja|Jude|Jud|((1|I|1st|First|2|II|2nd|Second) ?(Samuel|Sam|Sa|Kings|Kgs|Ki|K|Chronicles|Chron|Ch|Corinthians|Cor|Co|Thessalonians|Thess|Thes|Th|Timothy|Tim|Ti|Peter|Pet|Pe|Pt))|(((1|I|1st|First|2|II|2nd|Second|3|III|3rd|Third) ?)?John|Jn|Jhn)).?(,? ?[1-9][0-9]?[0-9]?:[1-9][0-9]?[0-9]?(-[1-9][0-9]?[0-9]?)?)+")
verses  = content.scan(rx)

Upvotes: 1

Views: 97

Answers (1)

BroiSatse
BroiSatse

Reputation: 44725

Try non-capturing groups:

(?:Genesis|Gen|Ge|...)

It will prevent matching to match subgroups, but I am not 100% sure if that's an issue here.

Upvotes: 2

Related Questions