Reputation: 1208
I have a list of strings. Each element represents a field as key value separated by space:
listA = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
I need to return a dict
out of this list with expanding the keys like 'xyz0-1'
by the range denoted by 0-1 into multiple keys like abcd1
and abcd2
with the same value like 4d4e
.
It should run as part of an Ansible plugin, where Python 2.7 is used.
The end result would look like the dict below:
{
abcd1: 4d4e,
abcd2: 4d4e,
xyz0: 551,
xyz1: 551,
foo: 3ea,
bar1: 2bd,
mc-mqisd0: 77a,
mc-mqisd1: 77a,
mc-mqisd2: 77a,
}
I have created below function. It is working with Python 3.
def listFln(listA):
import re
fL = []
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
comp = re.sub('^(.+?)(\d+-\d+)?$',r'\1',aL)
cmpCountR = re.sub('^(.+?)(\d+-\d+)?$',r'\2',aL)
if cmpCountR.strip():
nStart = int(cmpCountR.split('-')[0])
nEnd = int(cmpCountR.split('-')[1])
for j in range(nStart,nEnd+1):
fL.append(comp + str(j) + ' ' + bL)
else:
fL.append(i)
return(dict([k.split() for k in fL]))
In lower python versions like Python 2.7. this code throws an "unmatched group" error:
cmpCountR = re.sub('^(.+?)(\d+-\d+)?$',r'\2',aL)
File "/usr/lib64/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib64/python2.7/re.py", line 275, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib64/python2.7/sre_parse.py", line 800, in expand_template
raise error, "unmatched group"
Anything wrong with the regex here?
Upvotes: 2
Views: 178
Reputation: 163287
You could use a single pattern with 4 capture groups, and check if the 3rd capture group value is not empty.
^(\S*?)(?:(\d+)-(\d+))?\s+(.*)
The pattern matches:
^
Start of string\S*?)
Capture group 1, match optional non whitespace chars, as few as possible(?:(\d+)-(\d+))?
Optionally capture 1+ digits in group 2 and group 3 with a -
in between(.*)
Capture group 4, match the rest of the lineCode example (works on Python 2 and Python 3)
import re
strings = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
def listFln(listA):
dct = {}
for s in listA:
lst = sum(re.findall(r"^(\S*?)(?:(\d+)-(\d+))?\s+(.*)", s), ())
if lst and lst[2]:
for i in range(int(lst[1]), int(lst[2]) + 1):
dct[lst[0] + str(i)] = lst[3]
else:
dct[lst[0]] = lst[3]
return dct
print(listFln(strings))
Output
{
'abcd1': '4d4e',
'abcd2': '4d4e',
'xyz0': '551',
'xyz1': '551',
'foo': '3ea',
'bar1': '2bd',
'mc-mqisd0': '77a',
'mc-mqisd1': '77a',
'mc-mqisd2': '77a'
}
Upvotes: 2
Reputation: 9377
Used Python 2.7 to reproduce. This answer shows the issue with not found backreferences for re.sub
in Python 2.7 and some patterns to fix.
import re
# both seem identical
regex1 = '^(.+?)(\d+-\d+)?$'
regex2 = '^(.+?)(\d+-\d+)?$'
# also the compiled pattern is identical, see hash
re.compile(regex1) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
re.compile(regex2) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
Note: The compiled pattern using re.compile()
saves time when re-using multiple times like in this loop.
The error-message indicates that there are groups that aren't matched.
Put it other: In the matching result of re.sub
(docs to 2.7) there are references to groups like the second capturing group (\2
) that have not been found or captured in the given string input:
sre_constants.error: unmatched group
To fix this, we should test on groups that were found in the match.
Therefore we use re.match(regex, str)
or the compiled variant pattern.match(str)
to create a Match
object, then Match.groups()
to return all found groups as tuple.
import re
regex = '^(.+?)(\d+-\d+)?$' # a key followed by optional digits-range
pattern = re.compile(regex) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
def dict_with_expanded_digits(fields_list):
entry_list = []
for fields in fields_list:
(key_digits_range, value) = fields.split() # a pair of ('key0-1', 'value')
# test for match and groups found
match = pattern.match(key_digits_range)
print("DEBUG: groups:", match.groups()) # tuple containing all the subgroups of the match,
# watch: the 3rd iteration has only group(1), while group(2) is None
# break to next iteration here, if not maching pattern
if not match:
print('ERROR: no valid key! Will not add to dict.', fields)
continue
# if no 2nd group, only a single key,value
if not match.group(2):
print('WARN: key without range! Will add as single entry:', fields)
entry_list.append( (key_digits_range, value) )
continue # stop iteration here and continue with next
key = pattern.sub(r'\1', key_digits_range)
index_range = pattern.sub(r'\2', key_digits_range)
# no strip needed here
(start, end) = index_range.split('-')
for index in range(int(start), int(end)+1):
expanded_key = "{}{}".format(key, index)
entry = (expanded_key, value) # use tuple for each field entry (key, value)
entry_list.append(entry)
return dict([e for e in entry_list])
list_a = [
'abcd1-2 4d4e', # 2 entries
'xyz0-1 551', # 2 entries
'foo 3ea', # 1 entry
'bar1 2bd', # 1 entry
'mc-mqisd0-2 77a' # 3 entries
]
dict_a = dict_with_expanded_digits(list_a)
print("INFO: resulting dict with length: ", len(dict_a), dict_a)
assert len(dict_a) == 9
Prints:
('DEBUG: groups:', ('abcd', '1-2'))
('DEBUG: groups:', ('xyz', '0-1'))
('DEBUG: groups:', ('foo', None))
('WARN: key without range! Will add as single entry:', 'foo 3ea')
('DEBUG: groups:', ('bar1', None))
('WARN: key without range! Will add as single entry:', 'bar1 2bd')
('DEBUG: groups:', ('mc-mqisd', '0-2'))
('INFO: resulting dict with length: ', 9, {'bar1': '2bd', 'foo': '3ea', 'mc-mqisd2': '77a', 'mc-mqisd0': '77a', 'mc-mqisd1': '77a', 'xyz1': '551', 'xyz0': '551', 'abcd1': '4d4e', 'abcd2': '4d4e'})
(start, end)
re.
methods used the equivalent methods of compiled pattern pattern.
if not match.group(2):
avoids expanding the field and just adds the key-value as isassert
to verify given list of 7 is expanded to dict of 9 as expectedUpvotes: 2
Reputation: 3409
Here's a simpler version using findall
instead of sub
, successfully tested on 2,7. It also directly creates the dict instead of first building a list:
mylist=[
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
def listFln(listA):
import re
fL = {}
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
comp = re.findall('^(.+?)(\d+-\d+)?$',aL)[0]
if comp[1]:
nStart = int(comp[1].split('-')[0])
nEnd = int(comp[1].split('-')[1])
for j in range(nStart,nEnd+1):
fL[comp[0]+str(j)] = bL
else:
fL[comp[0]] = bL
return fL
print(listFln(mylist))
# {'abcd1': '4d4e',
# 'abcd2': '4d4e',
# 'xyz0': '551',
# 'xyz1': '551',
# 'foo': '3ea',
# 'bar1': '2bd',
# 'mc-mqisd0': '77a',
# 'mc-mqisd1': '77a',
# 'mc-mqisd2': '77a'}
Upvotes: 2