Reputation: 10995
I want to match a list of variables which look like directories, e.g.:
Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123
Same/Same2/Battery/Name=SomeString
Same/Same2/Home/Land/Some/More/Stuff=0.34
The length of the "subdirectories" is variable having an upper bound (above it's 9). I want to group every subdirectory except the 1st one which I named "Same" above.
The best I could come up with is:
^(?:([^/]+)/){4,8}([^/]+)=(.*)
It already looks for 4-8 subdirectories but only groups the last one. Why's that? Is there a better solution using group quantifiers?
Edit: Solved. Will use split() instead.
Upvotes: 3
Views: 1997
Reputation:
I probably misunderstood what exactly you want to do, but here is how you would do it without regex:
for entry in list_of_vars:
key, value = entry.split('=')
key_components = key.split('/')
if 4 <= len(key_components) <= 8:
# here the actual work is done
print "%s=%s" % ('_'.join(key_components[1:]).upper(), value)
Upvotes: 2
Reputation: 63762
A couple of variations on your theme. For one, I've always found regexen to be cryptic to the point of unmaintainable, so I wrote the pyparsing module. In my mind, I look at your code and think, "oh, it's a list of '/'-delimited strings, an '=' sign, and then some kind of rvalue." And that translates pretty directly into the pyparsing parser definition code. By adding a name here and there in the parser ("key" and "value", similar to named groups in regex), the output is pretty easily processed.
data="""\
Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123
Same/Same2/Battery/Name=SomeString
Same/Same2/Home/Land/Some/More/Stuff=0.34""".splitlines()
from pyparsing import Word, alphas, alphanums, Word, nums, QuotedString, delimitedList
wd = Word(alphas, alphanums)
number = Word(nums+'+-', nums+'.').setParseAction(lambda t:float(t[0]))
rvalue = wd | number | QuotedString('"')
defn = delimitedList(wd, '/')('key') + '=' + rvalue('value')
for d in data:
result = defn.parseString(d)
Second, I question your approach at defining all of those variable names - creating variable names on the fly based on your data is a pretty well-recognized Code Smell (not necessarily bad, but you might really want to rethink this approach). I used a recursive defaultdict to create a navigable structure so that you can easily do operations like "find all the entries that are sub-elements of "Same2" (in this case, "Foot", "Battery", and "Home") - this kind of work is more difficult when trying to sift through some collection of variable names as found in locals(), it seems to me you will end up re-parsing these names to reconstruct the key hierarchy.
from collections import defaultdict
class recursivedefaultdict(defaultdict):
def __init__(self, attrFactory=int):
self.default_factory = lambda : type(self)(attrFactory)
self._attrFactory = attrFactory
def __getattr__(self, attr):
newval = self._attrFactory()
setattr(self, attr, newval)
return newval
table = recursivedefaultdict()
# parse each entry, and accumulate into hierarchical dict
for d in data:
# use pyparsing parser, gives us key (list of names) and value
result = defn.parseString(d)
t = table
for k in result.key[:-1]:
t = t[k]
t[result.key[-1]] = result.value
# recursive method to iterate over hierarchical dict
def showTable(t, indent=''):
for k,v in t.items():
print indent+k,
if isinstance(v,dict):
print
showTable(v, indent+' ')
else:
print v
showTable(table)
Prints:
Same
Same2
Foot
Ankle
Joint
Actuator
Sensor
Temperature
Value 4.123
Battery
Name SomeString
Home
Land
Some
More
Stuff 0.34
If you are really set on defining those variable names, then adding some helpful parse actions to pyparsing will reformat the parsed data at parse time, so that it's directly processable afterwards:
wd = Word(alphas, alphanums)
number = Word(nums+'+-', nums+'.').setParseAction(lambda t:float(t[0]))
rvaluewd = wd.copy().setParseAction(lambda t: '"%s"' % t[0])
rvalue = rvaluewd | number | QuotedString('"')
defn = delimitedList(wd, '/')('key') + '=' + rvalue('value')
def joinNamesWithAllCaps(tokens):
tokens["key"] = '_'.join(map(str.upper, tokens.key))
defn.setParseAction(joinNamesWithAllCaps)
for d in data:
result = defn.parseString(d)
print result.key,'=', result.value
Prints:
SAME_SAME2_FOOT_ANKLE_JOINT_ACTUATOR_SENSOR_TEMPERATURE_VALUE = 4.123
SAME_SAME2_BATTERY_NAME = "SomeString"
SAME_SAME2_HOME_LAND_SOME_MORE_STUFF = 0.34
(Note that this also encloses your SomeString value in quotes, so that the resulting assignment statement is valid Python.)
Upvotes: 1
Reputation: 27585
import re
regx = re.compile('(?:(?<=\A)|(?<=/)).+?(?=/|\Z)')
for ss in ('Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123',
'Same/Same2/Battery/Name=SomeString',
'Same/Same2/Home/Land/Some/More/Stuff=0.34'):
print ss
print regx.findall(ss)
print
Now you have given more info on what you want to obtain ( _"Same/Same2/Battery/Name=SomeString becoming SAME2_BATTERY_NAME=SomeString"_ ) better solutions can be proposed: either with a regex or with split() , + replace()
import re
from os import sep
sep2 = r'\\' if sep=='\\' else '/'
pat = '^(?:.+?%s)(.+$)' % sep2
print 'pat==%s\n' % pat
ragx = re.compile(pat)
for ss in ('Same\Same2\Foot\Ankle\Joint\Actuator\Sensor\Temperature\Value=4.123',
'Same\Same2\Battery\Name=SomeString',
'Same\Same2\Home\Land\Some\More\Stuff=0.34'):
print ss
print ragx.match(ss).group(1).replace(sep,'_')
print ss.split(sep,1)[1].replace(sep,'_')
print
result
pat==^(?:.+?\\)(.+$)
Same\Same2\Foot\Ankle\Joint\Actuator\Sensor\Temperature\Value=4.123
Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
Same\Same2\Battery\Name=SomeString
Same2_Battery_Name=SomeString
Same2_Battery_Name=SomeString
Same\Same2\Home\Land\Some\More\Stuff=0.34
Same2_Home_Land_Some_More_Stuff=0.34
Same2_Home_Land_Some_More_Stuff=0.34
Re-reading your comment, I realized that I didn't take in account that you want to upper the part of the strings that lies before the '=' sign but not after it.
Hence, this new code that exposes 3 methods that answer this requirement. You will choose which one you prefer:
import re
from os import sep
sep2 = r'\\' if sep=='\\' else '/'
pot = '^(?:.+?%s)(.+?)=([^=]*$)' % sep2
print 'pot==%s\n' % pot
rogx = re.compile(pot)
pet = '^(?:.+?%s)(.+?(?==[^=]*$))' % sep2
print 'pet==%s\n' % pet
regx = re.compile(pet)
for ss in ('Same\Same2\Foot\Ankle\Joint\Sensor\Value=4.123',
'Same\Same2\Battery\Name=SomeString',
'Same\Same2\Ocean\Atlantic\North=',
'Same\Same2\Maths\Addition\\2+2=4\Simple=ohoh'):
print ss + '\n' + len(ss)*'-'
print 'rogx groups '.rjust(32),rogx.match(ss).groups()
a,b = ss.split(sep,1)[1].rsplit('=',1)
print 'split split '.rjust(32),(a,b)
print 'split split join upper replace %s=%s' % (a.replace(sep,'_').upper(),b)
print 'regx split group '.rjust(32),regx.match(ss.split(sep,1)[1]).group()
print 'regx split sub '.rjust(32),\
regx.sub(lambda x: x.group(1).replace(sep,'_').upper(), ss)
print
result, on a Windows platform
pot==^(?:.+?\\)(.+?)=([^=]*$)
pet==^(?:.+?\\)(.+?(?==[^=]*$))
Same\Same2\Foot\Ankle\Joint\Sensor\Value=4.123
----------------------------------------------
rogx groups ('Same2\\Foot\\Ankle\\Joint\\Sensor\\Value', '4.123')
split split ('Same2\\Foot\\Ankle\\Joint\\Sensor\\Value', '4.123')
split split join upper replace SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
regx split group Same2\Foot\Ankle\Joint\Sensor\Value
regx split sub SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
Same\Same2\Battery\Name=SomeString
----------------------------------
rogx groups ('Same2\\Battery\\Name', 'SomeString')
split split ('Same2\\Battery\\Name', 'SomeString')
split split join upper replace SAME2_BATTERY_NAME=SomeString
regx split group Same2\Battery\Name
regx split sub SAME2_BATTERY_NAME=SomeString
Same\Same2\Ocean\Atlantic\North=
--------------------------------
rogx groups ('Same2\\Ocean\\Atlantic\\North', '')
split split ('Same2\\Ocean\\Atlantic\\North', '')
split split join upper replace SAME2_OCEAN_ATLANTIC_NORTH=
regx split group Same2\Ocean\Atlantic\North
regx split sub SAME2_OCEAN_ATLANTIC_NORTH=
Same\Same2\Maths\Addition\2+2=4\Simple=ohoh
-------------------------------------------
rogx groups ('Same2\\Maths\\Addition\\2+2=4\\Simple', 'ohoh')
split split ('Same2\\Maths\\Addition\\2+2=4\\Simple', 'ohoh')
split split join upper replace SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh
regx split group Same2\Maths\Addition\2+2=4\Simple
regx split sub SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh
Upvotes: 2
Reputation: 53516
Just use split?
>>> p='Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123'
>>> p.split('/')
['Same', 'Same2', 'Foot', 'Ankle', 'Joint', 'Actuator', 'Sensor', 'Temperature', 'Value=4.123']
Also, if you want that key/val pair you can do something like this...
>>> s = p.split('/')
>>> s[-1].split('=')
['Value', '4.123']
Upvotes: 1