Reputation: 40
I am working on a program that makes stratigraphic columns for geologists. Rock units by the geologists are coded using 5 parameters: (1) a lithology code (2 characters), (2) primary code (1 character), (3) secondary code (1 character), and (4) tertiary code (1 character). So a rock unit can be coded like:
Ssxrs - making it a rooted and cross-bedded sandstone with a sharp basal contact.
It is easy to parse out 2 characters, 1 character, 1, and 1. But the geologist sometimes code the rock unit like:
Gr-Ss --- where the unit grades upward from a conglomerate to a sandstone, or
Gr/Ss --- where the conglomerate and sandstone are interbedded.
They can do this multiple times like:
Gr-Ss/Ls --- where a conglomerate grades upward to an interbedded sandstone and limestone. Not only do they do this for the lithology codes but also for the primary, secondary, and tertiary codes.
I would like to parse out the 5 code streams and actions (ie. "/" and "-") into a lithology list/array, primary list/array, secondary list/array, and tertiary list/array.
Is this a regex solvable problem?
Upvotes: 0
Views: 97
Reputation: 1478
The regex :
((?:[A-Za-z]{2}[-\/])*[A-Za-z]{2})((?:[A-Za-z][-\/])*[A-Za-z])((?:[A-Za-z][-\/])*[A-Za-z])((?:[A-Za-z][-\/])*[A-Za-z])
will allow you to find the 4 differents code in 4 differents groups : http://rubular.com/r/Y7rlT09soH
Some explanations : first capturing group :
((?:[A-Za-z]{2}[-\/])*[A-Za-z]{2})
will capture, 0 or more time, 2 letters followed by a "-" or a "/", followed by 2 letters. (The "?:" is for no capturing group)
The 3 next capturing group are identical :
((?:[A-Za-z][-\/])*[A-Za-z])
They will do the same as the first one but with only one letter.
Upvotes: 1