Reputation: 10882

Python cartesian product of n lists with n unknown at coding time

Question

what is the best way to generate a cartesian product of some lists, not knowing in advance how many lists there are?

You can stop reading here if you like.

Background

I don't have money for school so I am trying to teach myself some programming using the Internet whilst working night shifts at a highway tollbooth. I have decided to try to solve some "programming challenge" problems as an exercise.

Programming assignment

Here's the problem I am trying to tackle, property of TopCoder:

http://community.topcoder.com/stat?c=problem_statement&pm=3496

I will not copy and paste the full description to respect their copyright notice but I am assuming I can summarise it, provided I don't use pieces of it verbatim (IANAL though).

Summary

If a "weighted sum" of historical stock prices is the sum of addenda obtained by multiplying a subset of these prices by an equal number of "weighting" factors, provided the latter add up to 1.0 and are chosen from the given set of valid values [-1.0, -0.9, ..., 0.9, 1.0], use this formula on all historical data supplied as an argument to your function, examining 5 prices at a time, predicting the next price and returning the permutation of "weighting factors" that yields the lowest average prediction error. There will be at least 6 stock prices in each run so at least one prediction is guaranteed, final results should be accurate within 1E-9.

Test data

Format:

One row for input data, in list format
One row for the expected result
One empty row as a spacer

Download from:

http://paste.ubuntu.com/1229857/

My solution

import itertools

# For a permutation of factors to be used in a weighted sum, it should be chosen
# such than the sum of all factors is 1.
WEIGHTED_SUM_TOTAL = 1.0
FACTORS_CAN_BE_USED_IN_WEIGHTED_SUM = lambda x: sum(x) == WEIGHTED_SUM_TOTAL

# Historical stock price data should be examined using a sliding window of width
# 5 when making predictions about the next price.
N_RECENT_PRICES = 5

# Valid values for weighting factors are: [-1.0, -0.9, ..., 0.9, 1.0]
VALID_WEIGHTS = [x / 10. for x in range(-10, 11)]

# A pre-calculated list of valid weightings to consider. This is the cartesiant
# product of the set of valid weigths considering only the combinations which
# are valid as components of a weighted sum.
CARTESIAN_PRODUCT_FACTORS = [VALID_WEIGHTS] * N_RECENT_PRICES
ALL_PERMUTATIONS_OF_WEIGHTS = itertools.product(*CARTESIAN_PRODUCT_FACTORS)
WEIGHTED_SUM_WEIGHTS = filter(FACTORS_CAN_BE_USED_IN_WEIGHTED_SUM,
                              ALL_PERMUTATIONS_OF_WEIGHTS)

# Generator function to get sliding windows of a given width from a data set
def sliding_windows(data, window_width):

  for i in range(len(data) - window_width):
    yield data[i:i + window_width], data[i + window_width]

def avg_error(data):

  # The supplied data will guarantee at least one iteration
  n_iterations = len(data) - 5

  best_average_error = None

  # Consider each valid weighting (e.g. permutation of weights)
  for weighting in WEIGHTED_SUM_WEIGHTS:

    # Keep track of the prediction errors for this weighting
    errors_for_this_weighting = []

    for historical_data, next_to_predict in sliding_windows(data,
                                                            N_RECENT_PRICES):

      prediction = sum([a * b for a, b in zip(weighting, historical_data)])
      errors_for_this_weighting.append(abs(next_to_predict - prediction))

    average_error = sum(errors_for_this_weighting) / n_iterations

    if average_error == 0: return average_error

    best_average_error = (average_error if not best_average_error else
      min(average_error, best_average_error))

  return best_average_error

def main():
  with open('data.txt') as input_file:
    while True:
        data = eval(input_file.readline())
        expected_result = eval(input_file.readline())
        spacer = input_file.readline()
        if not spacer:
          break
        result = avg_error(data)
        print expected_result, result, (expected_result - result) < 1e-9

if __name__ == '__main__':
    main()

My question

I am not asking for a code review of my solution because this would be the wrong StackExchange forum for that. I would post my solution to "Code Review" in that case.

My question instead is small, precise and unambiguous, fitting this site's format (hopefully).

In my code I generate a cartesian product of lists using itertools. In essence, I do not solve the crux of the problem myself but delegate the solution to a library that does that for me. I think this is the wrong approach to take if I want to learn from doing these exercises. I should be doing the hard part myself, otherwise why do the exercise at all? So I would like to ask you:

what is the best way to generate a cartesian product of some lists, not knowing in advance how many lists there are?

That's all I'd like to know, you can critique my code if you like. That's welcome, even if it passes all the tests (there is always a better way of doing things, especially if you are a beginner like me) but for this question to be "just right" for SO, I am focussing on just one aspect of the code, a concrete problem I have and something I am not happy with. Let me tell you more, I'll also share the canonical "what have you tried already"...

Clearly if I knew the number of lists I could just type in some nested for loops, like the top solvers of this exercise did in the competition. I tried writing a function that does this for an unknown number of lists but I was not sure which approach to take. The first approach was to write a recursive function. From list 1, take element 1 and combine it with element 1 of list 2, then element 1 of list 3, etc. I would push onto a stack the elements from each "layer" and pop them upon reaching the desired depth. I would imagine I would not fear a "stack overflow" because the depth reachable would be reasonable. I then struggled to choose a data structure to do this in the most efficient (memory/space) way possible, without passing too many parameters to the recursive calls. Should the data structure exist outside of the calls? Be passed around in the calls? Could I achieve any level of parallelism? How? With so many questions and so few answers I realised I needed to just know more to solve this problem and I could use a nudge in the right direction. You could provide a code snippet and I would study it. Or just explain to me what is the right "Computer Science" way of handling this type of problem. I am sure there is something I am not considering.

Finally, the thing that I did consider in my solution above is that thankfully filter filters a generator so the full cartesian product is never kept in memory (as it would if I did a list(ALL_PERMUTATIONS_OF_WEIGHTS) at any time in the code) so I am occupying space in memory only for those combinations which can actually be used as a weighted sum. A similar caution would be nice if applied to whatever system allows me to generate the cartesian product without using itertools.

Upvotes: 7

Answers (5)

hobbs

Reputation: 239930

Here's a favorite (and pedagogically decent, I hope) way of implementing the cartesian product in terms of reduce, translated from a Perl version that I wrote some time ago:

def cartesian_product(*X):
  return reduce(
    lambda accum, lst: 
      [ tup + (item,) for tup in accum for item in lst ],
    X,
    [()]
  )

It's similar to hayden's answer, except that it uses reduce instead of explicit recursion, which I think makes the base case a lot clearer. What we're reducing here is a list of tuples (the accumulated output, accum) against a list of items (lst). For every item in the list of items, we concatenate it to the end of all of the accumulated tuples, and repeat this process for as many lists (X) as there are. The reduce initializer is [()], a list containing one empty tuple, which ensures that if X[0] is [1, 2, 3] the accumulator will become [(1), (2), (3)] after the first step (one tuple because we want each item in X[0] once, and a zero-tuple because we want it to be concatenated to nothing). This corresponds to the "nullary product" mentioned by senderle in a comment to icktoofay's answer.

Given this function definition, if you print cartesian_product([1,2], [3,4], [5,6]) it will print:

[(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]

which are the 8 tuples we expected.

Upvotes: 1

Lord Henry Wotton

Reputation: 1362

Itertools to the rescue. The following will create combinations as they are used one-by-one:

import itertools
combs=itertools.product(*lists)

E. g. using command line Python, and assuming you have a list of lists of variable length:

>>> c=[['3', '5', '7'], ['100'], ['1', '2', '3']]
>>> z=itertools.product(*c)
>>> for ii in z:
...     print ii
... 
('3', '100', '1')
('3', '100', '2')
('3', '100', '3')
('5', '100', '1')
('5', '100', '2')
('5', '100', '3')
('7', '100', '1')
('7', '100', '2')
('7', '100', '3')

Upvotes: 0

icktoofay

Reputation: 129001

First, consider an n-list cartesian product. Let's take the first list, which we'll call L. Then we'll take the rest of the lists, which we'll call R. Then, for each item in L, prepend that to the beginning of each tuple yielded by the cartesian product of R.

With that, you can solve the problem just by implementing the cartesian product of no lists.

Here's a Haskell implementation, in case it helps you understand what I'm saying:

cartesian :: [[a]] -> [[a]]
cartesian [] = [[]]
cartesian (xs:yss) = [x : ys | x <- xs, ys <- cartesian yss]

Upvotes: 3

max

Reputation: 52263

Think of how numbers are written (in the decimal system, or in any other system). Include zero's even if you don't need them:

You can see how this looks like a cartesian product of 5 lists list(range(10)) (in this particular case). You can generate this output very easily by incrementing the "lowest" digit, and when it reaches the last one in the list, setting it to the first element and incrementing the "next highest" digit. You would still need for loops, of course, but a very small number. Use a similar approach when you work with arbitrary number of arbitrary lists.

For example, if you have 3 lists: ['a', 'b', 'c'], ['x', 'y'], ['1', '2'], you'll get:

ax1
ax2
ay1
ay2
bx1
bx2
by1
by2
cy1
cy2
cx1
cx2

Good luck!

EDIT:

If you like, here's a sample code to do this. I don't recursion just to show how simple this is. Recursion, of course, is also a great approach.

def lex_gen(bounds):
    elem = [0] * len(bounds)
    while True:
        yield elem
        i = 0
        while elem[i] == bounds[i] - 1:
            elem[i] = 0
            i += 1
            if i == len(bounds):
                raise StopIteration
        elem[i] += 1

def cart_product(lists):
    bounds = [len(lst) for lst in lists]
    for elem in lex_gen(bounds):
        yield [lists[i][elem[i]] for i in range(len(lists))]


for k in cart_product([['1', '2'], ['x', 'y'], ['a', 'b', 'c']]):
    print(k)

Upvotes: 5

Andy Hayden

Reputation: 375535

Classically, Cartesian coordinates are (x,y) in the plane or (x,y,z) in 3D-space (for x, y and z in the real numbers):

[ (x,y) for x in reals for y in reals ]

More generally they are tuples (as a Python list comprehension):

[ (x1, x2, x3, ...) for x1 in X1 for x2 in X2 for x3 in X3 ...]

For objects (iterables in our case) X1, X2, X3,..., and we would like is a function:

def cartesian_product(X1,X2,X3,...):
     return # the above list

One way to do this is to use recursion, taking care to always return tuples:

def cartesian_product(*X):
    if len(X) == 1: #special case, only X1
        return [ (x0,) for x0 in X[0] ]
    else:
        return [ (x0,)+t1 for x0 in X[0] for t1 in cartesian_product(*X[1:]) ]

cartesian_product([1,2],[3,4],[5,6])
# [(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]