maurobio
maurobio

Reputation: 1587

Python - Create set from string

I have strings in the format "1-3 6:10-11 7-9" and from them I want to create number sets as follows {1,2,3,6,10,11,7,8,9}.

For creating the set from the range of numbers, I have the following code:

def create_set(src):
    lset = []
    if len(src) > 0:
        pos = src.find('-')
        if pos != -1:
            first = int(src[:pos])
            last  = int(src[pos+1:])
        else:
            return [int(src)]  # Only one number
        for j in range (first, last+1): 
            lset.append(j)
        return set(lset)

But I cannot figure out how to correctly treat the ':' when it appears in the string. Can someone help me?

Thanks in advance!

EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?

Upvotes: 1

Views: 1475

Answers (2)

FujiApple
FujiApple

Reputation: 836

EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?

Perhaps a cleaner (and slightly more efficient) way:

import re
import itertools

allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)
expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]
print {x for x in itertools.chain.from_iterable(expanded)}

Explanations:

Match all strings like 'a-b' or 'a:' and return a list of (a, b) and (a, '') pairs respectively:

allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)

This produces:

[('1', '3'), ('6', ''), ('10', '11'), ('7', '9')]

Using list comprehension expand all pairs of (x, y) into the full list of numbers in the range (x, y + 1), taking care to handle the (x, '') case as (x, x+1):

expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]

This produces:

[[1, 2, 3], [6], [10, 11], [7, 8, 9]]

Use itertools.chain.from_iterable() to transform the list of lists into a single iterable which is iterated by a set comprehension into the final set:

print {x for x in itertools.chain.from_iterable(expanded)}

This produces:

set([1, 2, 3, 6, 7, 8, 9, 10, 11])

Upvotes: 1

xnx
xnx

Reputation: 25548

Something like this might work for you:

s = '1-3 6:10-11 7-9'
s = s.replace(':', ' ')
lset = set()
fs = s.split()
for f in fs:
    r = f.split('-')
    if len(r)==1:
        # add a single number
        lset.add(int(r[0]))
    else:
        # add a range of numbers (inclusive of the endpoints)
        lset |= set(range(int(r[0]), int(r[1])+1))
print(lset)

Upvotes: 5

Related Questions