puffin
puffin

Reputation: 1351

Regex to return 2 groups

I have the following data that I need to extract 2 groups from. I need the 3 letter code in caps between var and Destinations, and in the second group, I need all the 3 letter codes (without single quotes) after Array but not the codes on lines that begin with //.

Below is the regex I have so far, any help is appreciated.

var\s([A-Z]{3})_Destinations\s*=\snew\sArray\((?:,?)|(\'([A-Z]{3})\')*

var Dests = new Array ('KIR','SEN','MAN','NCL','RNS','SNN',0); #Don't need any of this

//var NOC_Destinations  = new Array('BHX'); # Don't need any of this
var ABZ_Destinations    = new Array('DUB'); # Need this
//var RNS_Destinations  = new Array('ORK','DUB'); # Don't need this
var BHX_Destinations    = new Array('ORK','DUB','SNN'); # Need this

Upvotes: 1

Views: 74

Answers (1)

Jon Clements
Jon Clements

Reputation: 142156

While @thefourtheye is right, as long as your use-case is limited to the example provided, you could do:

text = """
//var NOC_Destinations  = new Array('BHX'); # Don't need any of this
var ABZ_Destinations    = new Array('DUB'); # Need this
//var RNS_Destinations  = new Array('ORK','DUB'); Don't need this
var BHX_Destinations    = new Array('ORK','DUB','SNN'); # Need this
"""

import re
import ast

from_to = {frm: ast.literal_eval(to) for frm, to in re.findall('^var ([A-Z]{3})_Destinations.*?\((.*?)\)', text, flags=re.M)}
# {'BHX': ('ORK', 'DUB', 'SNN'), 'ABZ': 'DUB'}

You may wish to consider normalising the to somehow... maybe by making sure they're all strings, or all tuples/lists etc... Something like:

def to_list(text):
    parsed = ast.literal_eval(text)
    if isinstance(parsed, basestring):
        return [parsed]
    return list(parsed)


from_to = {frm: to_list(to) for frm, to in re.findall('^var ([A-Z]{3})_Destinations.*?\((.*?)\)', text, flags=re.M)}
# {'BHX': ['ORK', 'DUB', 'SNN'], 'ABZ': ['DUB']}

Upvotes: 1

Related Questions