How do I find all occurrences of a sequence of chars when preceded by a specific string?

Question

I'm trying to extract all matches from a EBML definition, which is something like this:

| + A track
|  + Track number: 3
|  + Track UID: 724222477
|  + Track type: subtitles
...
|  + Language: eng
...
| + A track
|  + Track number: 4
|  + Track UID: 745646561
|  + Track type: subtitles
...
|  + Language: jpn
...

I want all occurrences of "Language: ???" when preceded by "Track type: subtitles". I tried several variations of this:

Track type: subtitles.*Language: (\w\w\w)

I'm using the multi-line modifier in Ruby so it matches newlines (like the 's' modifier in other languages).

This works to get the last occurrence, which in the example above, would be 'jpn', for example:

string.scan(/Track type: subtitles.*Language: (\w\w\w)/m)
=> [["jpn"]]

The result I'd like:

=> [["eng"], ["jpn"]]

What would be a correct regex to accomplish this?

Paige Ruten · Accepted Answer

You need to make your regex non-greedy by changing this:

.*

To this:

.*?

Your regex is matching from the first occurence of Track type: subtitles to the last occurence of Language: (\w\w\w). Making it non-greedy will work because it matches as few characters as possible.

How do I find all occurrences of a sequence of chars when preceded by a specific string?

Answers (2)

Related Questions