Reputation: 1351
I have the following data that I need to extract 2 groups from. I need the 3 letter code in caps between var
and Destinations
, and in the second group, I need all the 3 letter codes (without single quotes) after Array
but not the codes on lines that begin with //
.
Below is the regex I have so far, any help is appreciated.
var\s([A-Z]{3})_Destinations\s*=\snew\sArray\((?:,?)|(\'([A-Z]{3})\')*
var Dests = new Array ('KIR','SEN','MAN','NCL','RNS','SNN',0); #Don't need any of this
//var NOC_Destinations = new Array('BHX'); # Don't need any of this
var ABZ_Destinations = new Array('DUB'); # Need this
//var RNS_Destinations = new Array('ORK','DUB'); # Don't need this
var BHX_Destinations = new Array('ORK','DUB','SNN'); # Need this
Upvotes: 1
Views: 74
Reputation: 142156
While @thefourtheye is right, as long as your use-case is limited to the example provided, you could do:
text = """
//var NOC_Destinations = new Array('BHX'); # Don't need any of this
var ABZ_Destinations = new Array('DUB'); # Need this
//var RNS_Destinations = new Array('ORK','DUB'); Don't need this
var BHX_Destinations = new Array('ORK','DUB','SNN'); # Need this
"""
import re
import ast
from_to = {frm: ast.literal_eval(to) for frm, to in re.findall('^var ([A-Z]{3})_Destinations.*?\((.*?)\)', text, flags=re.M)}
# {'BHX': ('ORK', 'DUB', 'SNN'), 'ABZ': 'DUB'}
You may wish to consider normalising the to somehow... maybe by making sure they're all strings, or all tuples/lists etc... Something like:
def to_list(text):
parsed = ast.literal_eval(text)
if isinstance(parsed, basestring):
return [parsed]
return list(parsed)
from_to = {frm: to_list(to) for frm, to in re.findall('^var ([A-Z]{3})_Destinations.*?\((.*?)\)', text, flags=re.M)}
# {'BHX': ['ORK', 'DUB', 'SNN'], 'ABZ': ['DUB']}
Upvotes: 1