Alvin
Alvin

Reputation: 2543

Python create list from smallest number of repeating values in a nested list

I have a Dict

whs = {
    'ID1' : ['code1', 'code2', 'code3'],
    'ID2' : ['code2', 'code5', 'code3'],
    'ID3' : ['code6', 'code7', 'code8'],
    'ID4' : ['code3', 'code5', 'code6'],
}

What I need to do is build a new list that will look like

submit = [
    {
        'codes' : ['code3', ],
        'ids' : ['ID1', 'ID2', 'ID4'],
    },
    {
        'codes' : ['code6', 'code7', 'code8'],
        'ids' : ['ID3', ],
    }
]

What I have so far

def ParseAvailable(self, whs):
    separate = whs.keys()
    submit = []
    while len(separate) > 0:
        avail = {
            'codes' : [],
            'ids' : [],
        }
        for num, item in enumerate(separate):
            if len(avail['codes']) == 0:
                avail['codes'] = whs[item]
                avail['ids'].append(item)
            else:
                avail_all = list(set(avail['codes']) & set(whs[item]))
                print '%s : %s' % (item, avail_all)
                if len(avail_all) > 0:
                    avail['codes'] = avail_all
                    avail['ids'].append(item)
            if len(avail['codes']) > 0:
                del separate[num]
        submit.append(avail)
    return submit

Which returns:

[
    {
        'ids': ['ID4', 'ID3'], 
        'codes': ['code6']
    },
    {
        'ids': ['ID2'], 
        'codes': ['code2', 'code5', 'code3']
    },
    {
        'ids': ['ID1'],
         'codes': ['code1', 'code2', 'code3']
    }
]

which COULD work except that ID1 & ID2 should be combined as

{
    'ids' : ['ID1', 'ID2',],
    'codes' : ['code2', 'code3', ]
}

Curious if there is an easier approach that I've not thought of, figure I could setup a couple more nested loops to compare everything piece by piece though it seems rather unpythonic

Thank You in Advance

Upvotes: 1

Views: 243

Answers (3)

Dave
Dave

Reputation: 3956

As others said, this is an optimization problem that isn't trivial for large datasets. But as a first step, it's at least easy to invert the list of warehouses per product to a list of products per warehouse.

whs = {
    'ID1' : ['code1', 'code2', 'code3'],
    'ID2' : ['code2', 'code5', 'code3'],
    'ID3' : ['code6', 'code7', 'code8'],
    'ID4' : ['code3', 'code5', 'code6'],
}

a = []
b = []
for k,v in whs.items():
    for j in v:
        a.append((j,k))
        b.append((j,[]))
inv = dict(b)
for j in a:
    inv[j[0]].append(j[1])

print("Inventory:")
for k,v in inv.items():
    print(k,v)

When run, this prints:

Inventory:
code1 ['ID1']
code2 ['ID2', 'ID1']
code3 ['ID4', 'ID2', 'ID1']
code5 ['ID4', 'ID2']
code6 ['ID4', 'ID3']
code7 ['ID3']
code8 ['ID3']

As you can see from the trivial example, there are no solutions that will get all products in a single order, and several solutions (code2+code6, code3+code6, code3+code7, code3+code8) that will get them in two orders. Picking the "best" would require additional information, such as pricing, to optimize on.

Upvotes: 0

Zack Bloom
Zack Bloom

Reputation: 8427

I attacked it by building a tree of all the potential additions, and then finding the cheapest option among those. Here is a working (albeit ugly and unoptimized) example:

https://gist.github.com/1288835

The tree will end up with p*w nodes, where p is the number of products and w(p) is the average number of warehouses per product.

Upvotes: 2

goldsz
goldsz

Reputation: 348

So it looks to me like you have a set cover problem, which is NP-complete. Not only that, but it appears that you want all the possible set covers of the minimum size. Your list of ids creates the universe set, and each of your codes creates a sub set of the universe.

For your simple iterative loops you don't really have a choice other then to try all possible combinations of codes. If you have large data sets this isn't going to work.

The wikipedia article mentions an efficient algorithm (greedy algorithm) that isn't guaranteed to find the best solution, but the error in the solution is bounded. This algorithm is to iteratively add the subset the contains the most uncovered elements of the universe.

If you need to get the absolutely correct answer you may want to try the Integer Programming approach (as in Linear Programming). There are software packages and libraries for this algorithm.

Upvotes: 0

Related Questions