Matching Patterns with Text in Between

Question

In quantitative verse (like what's used in Greek and Latin poetry), lines are split into parts called spondees and dactyls. A dactyl is a long vowel (like ā) followed by two short vowels, while a spondee is two long vowels.

My goal is to automate the splitting of lines into spondees and dactyls in Python.

Given a line like

ārma virūmqe canō

I'm trying to get the output

arma vi / rūmque ca / nō

I've been thinking that using a regex to find either the pattern (long,short,short), or (long,long) would be a good idea, but I can't seem to figure out how to deal with the fact that these vowels are rarely going to be consecutive, and that the number of consonants between them will vary every time.

Is there a way to look for specific characters with an arbitrary number of other, irrelevant characters between them, using a regex? If not, is there another, relatively elegant way to achieve the same goal?

Edit:

If you need more examples @Junuxx pointed out a great site. Here's a link to a picture of the scansion of the first 7 lines of the Aeneid, from which I got the example above. Every time that there are just two vowels in a segment, it's a spondee. If there are three, it's a dactyl. Ignore the bolded lines, as they just indicate the third division in a line.

Edit II:

Looks like I made a typo in my example. I wrote "virumqe", when, in reality, the line is "virumque". In Latin, (ae,au,ei,eu,oe) are dipthongs, and are treated as one vowel. I suppose, then, that I must amend my question to ask if it's possible to deal with those as well.

Junuxx · Accepted Answer

The code below works on your example, however, the regex is rather long since there's no concise way to match consonants.

Breakdown of the regex for a dactyl:

 [^āēīōūaeiou]*  # 0 or more consonants
 [āēīōū]         # a long vowel
 [^āēīōūaeiou]*  # 0 or more consonants
 [aeiou]         # a short vowel
 [^āēīōūaeiou]*  # 0 or more consonants
 [aeiou]         # a short vowel 
 [^āēīōūaeiou]*? # 0 or more consonants, but as few as possible

Code:

# -*- coding: utf-8 -*-

import re
s = u"ārma virūmqe canō"
# Long vowels: āēīōū

m = re.findall(u'([^āēīōūaeiou]*[āēīōū][^āēīōūaeiou]*' # Dactyls
               u'[aeiou][^āēīōūaeiou]*[aeiou][^āēīōūaeiou]*?'
               u'|'
               u'[^āēīōūaeiou]*[āēīōū][^āēīōūaeiou]*?'  # Spondees
               u'[āēīōū]?[^āēīōūaeiou]*'
               u'|'
               u'[\w\s]*)', s)                         # Catch all leftovers

try:
    print ' / '.join(m)
except:
    print 'no match'

Output:

ārma vi / rūmqe ca / nō

Matching Patterns with Text in Between

Answers (1)

Related Questions