Reputation: 1587
I have strings in the format "1-3 6:10-11 7-9"
and from them I want to create number sets as follows {1,2,3,6,10,11,7,8,9}
.
For creating the set from the range of numbers, I have the following code:
def create_set(src):
lset = []
if len(src) > 0:
pos = src.find('-')
if pos != -1:
first = int(src[:pos])
last = int(src[pos+1:])
else:
return [int(src)] # Only one number
for j in range (first, last+1):
lset.append(j)
return set(lset)
But I cannot figure out how to correctly treat the ':' when it appears in the string. Can someone help me?
Thanks in advance!
EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?
Upvotes: 1
Views: 1475
Reputation: 836
EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?
Perhaps a cleaner (and slightly more efficient) way:
import re
import itertools
allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)
expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]
print {x for x in itertools.chain.from_iterable(expanded)}
Explanations:
Match all strings like 'a-b' or 'a:' and return a list of (a, b) and (a, '') pairs respectively:
allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)
This produces:
[('1', '3'), ('6', ''), ('10', '11'), ('7', '9')]
Using list comprehension expand all pairs of (x, y) into the full list of numbers in the range (x, y + 1), taking care to handle the (x, '') case as (x, x+1):
expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]
This produces:
[[1, 2, 3], [6], [10, 11], [7, 8, 9]]
Use itertools.chain.from_iterable()
to transform the list of lists into a single iterable which is iterated by a set comprehension into the final set:
print {x for x in itertools.chain.from_iterable(expanded)}
This produces:
set([1, 2, 3, 6, 7, 8, 9, 10, 11])
Upvotes: 1
Reputation: 25548
Something like this might work for you:
s = '1-3 6:10-11 7-9'
s = s.replace(':', ' ')
lset = set()
fs = s.split()
for f in fs:
r = f.split('-')
if len(r)==1:
# add a single number
lset.add(int(r[0]))
else:
# add a range of numbers (inclusive of the endpoints)
lset |= set(range(int(r[0]), int(r[1])+1))
print(lset)
Upvotes: 5