Todd Thompson
Todd Thompson

Reputation: 40

Is it possible to parse this code using regex?

I am working on a program that makes stratigraphic columns for geologists. Rock units by the geologists are coded using 5 parameters: (1) a lithology code (2 characters), (2) primary code (1 character), (3) secondary code (1 character), and (4) tertiary code (1 character). So a rock unit can be coded like:

Ssxrs - making it a rooted and cross-bedded sandstone with a sharp basal contact.

It is easy to parse out 2 characters, 1 character, 1, and 1. But the geologist sometimes code the rock unit like:

Gr-Ss --- where the unit grades upward from a conglomerate to a sandstone, or

Gr/Ss --- where the conglomerate and sandstone are interbedded.

They can do this multiple times like:

Gr-Ss/Ls --- where a conglomerate grades upward to an interbedded sandstone and limestone. Not only do they do this for the lithology codes but also for the primary, secondary, and tertiary codes.

I would like to parse out the 5 code streams and actions (ie. "/" and "-") into a lithology list/array, primary list/array, secondary list/array, and tertiary list/array.

Is this a regex solvable problem?

Upvotes: 0

Views: 97

Answers (1)

Pilou
Pilou

Reputation: 1478

The regex :

((?:[A-Za-z]{2}[-\/])*[A-Za-z]{2})((?:[A-Za-z][-\/])*[A-Za-z])((?:[A-Za-z][-\/])*[A-Za-z])((?:[A-Za-z][-\/])*[A-Za-z])

will allow you to find the 4 differents code in 4 differents groups : http://rubular.com/r/Y7rlT09soH

Some explanations : first capturing group :

((?:[A-Za-z]{2}[-\/])*[A-Za-z]{2})

will capture, 0 or more time, 2 letters followed by a "-" or a "/", followed by 2 letters. (The "?:" is for no capturing group)

The 3 next capturing group are identical :

((?:[A-Za-z][-\/])*[A-Za-z])

They will do the same as the first one but with only one letter.

Upvotes: 1

Related Questions