Reputation: 163
I am facing a surprising challenge with Python.
I am a Physicist generating a series of simulations of layers at an optical interface. The details of the simulations are not specifically important but what is crucial is that I generate all possible cases are generated - different materials within a range of thicknesses and layer orders.
I have been writing code to generate a comprehensive and unique list but I am staggered at how long it takes to compute even relatively simple systems! Surely Python and a reasonable computer should handle this without excessive stress. Suggestions would be greatly appreciated.
Thank you
from itertools import permutations, combinations_with_replacement
def join_adjacent_repeated_materials(potential_structure):
"""
Self-explanitory...
"""
#print potential_structure
new_layers = [] # List to hold re-cast structure
for layer in potential_structure:
if len(new_layers) > 0: # if not the first item in the list of layers
last_layer=new_layers[-1] # last element of existing layer list
if layer[0] == last_layer[0]: # true is the two layers are the same material
combined_layer = (layer[0], layer[1] + last_layer[1])
new_layers[len(new_layers)-1] = combined_layer
else: # adjcent layers are different material so no comibantion is possible
new_layers.append(layer)
else: # for the first layer
new_layers.append(layer)
return tuple(new_layers)
def calculate_unique_structure_lengths(thicknesses, materials, maximum_number_of_layers,\
maximum_individual_layer_thicknesses, \
maximum_total_material_thicknesses):
"""
Create a set on all possible multilayer combinations.
thicknesses : if this contains '0' the total number of layers will vary
from 0 to maximum_number_of_layers, otherwise, the
number total number layers will always be maximum_number_of_layers
e.g. arange(0 , 100, 5)
materials : list of materials used
e.g. ['Metal', 'Dielectric']
maximum_number_of_layers : pretty self-explanitory...
e.g. 5
maximum_individual_layer_thicknesses : filters the created the multilayer structures
preventing the inclusion layers that are too thick
- this is important after the joining of
adjacent materials
e.g. (('Metal',30),('Dielectric',20))
maximum_total_material_thicknesses : similar to the above but filters structures where the total
amount of a particular material is exceeded
e.g. (('Metal',50),('Dielectric',100))
"""
# generate all possible thickness combinations and material combinations
all_possible_thickness_sets = set(permutations(combinations_with_replacement(thicknesses, maximum_number_of_layers)))
all_possible_layer_material_orders = set(permutations(combinations_with_replacement(materials, maximum_number_of_layers)))
first_set = set() # Create set object (list of unique elements, no repeats)
for layer_material_order in all_possible_layer_material_orders:
for layer_thickness_set in all_possible_thickness_sets:
potential_structure = [] # list to hold this structure
for layer, thickness in zip(layer_material_order[0], layer_thickness_set[0]): # combine the layer thickness with its material
if thickness != 0: # layers of zero thickness are not added to potential_structure
potential_structure.append((layer, thickness))
first_set.add(tuple(potential_structure)) # add this potential_structure to the first_set set
#print('first_set')
#for struct in first_set:
# print struct
## join adjacent repeated materials
second_set = set() # create new set
for potential_structure in first_set:
second_set.add(join_adjacent_repeated_materials(potential_structure))
## remove structures where a layer is too thick
third_set = set()
for potential_structure in second_set: # check all the structures in the set
conditions_satisfied=True # default
for max_condition in maximum_individual_layer_thicknesses: # check this structure using each condition
for layer in potential_structure: # examine each layer
if layer[0] == max_condition[0]: # match condition with material
if layer[1] > max_condition[1]: # test thickness condition
conditions_satisfied=False
if conditions_satisfied:
third_set.add(potential_structure)
##remove structures that contain too much of a certain material
fourth_set = set()
for potential_structure in second_set: # check all the structures in the set
conditions_satisfied=True # default
for max_condition in maximum_total_material_thicknesses: # check this structure using each condition
amount_of_material_in_this_structure = 0 # initialise a counter
for layer in potential_structure: # examine each layer
if layer[0] == max_condition[0]: # match condition with material
amount_of_material_in_this_structure += layer[1]
if amount_of_material_in_this_structure > max_condition[1]: # test thickness condition
conditions_satisfied=False
if conditions_satisfied:
fourth_set.add(potential_structure)
return fourth_set
thicknesses = [0,1,2]
materials = ('A', 'B') # Tuple cannot be accidentally appended to later
maximum_number_of_layers = 3
maximum_individual_layer_thicknesses=(('A',30),('B',20))
maximum_total_material_thicknesses=(('A',20),('B',15))
calculate_unique_structure_lengths(thicknesses, materials, maximum_number_of_layers,\
maximum_individual_layer_thicknesses = maximum_individual_layer_thicknesses, \
maximum_total_material_thicknesses = maximum_total_material_thicknesses)
Upvotes: 2
Views: 139
Reputation: 123491
FWIW I instrumented your code to enable profiling. Here are the results:
Output:
Sun May 25 16:06:31 2014 surprising-challenge-generating-comprehensive-python-list.stats
348,365,046 function calls in 1,538.413 seconds
Ordered by: cumulative time, internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1052.933 1052.933 1538.413 1538.413 surprising-challenge-generating-comprehensive-python-list.py:34(calculate_unique_structure_lengths)
87091200 261.764 0.000 261.764 0.000 {zip}
87091274 130.492 0.000 130.492 0.000 {method 'add' of 'set' objects}
174182440 93.223 0.000 93.223 0.000 {method 'append' of 'list' objects}
30 0.000 0.000 0.000 0.000 surprising-challenge-generating-comprehensive-python-list.py:14(join_adjacent_repeated_materials)
100 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
To get an even finer grained picture of where the code was spending it's time, I used the line_profiler module on an almost verbatim copy of your code and got the following results for each of your functions:
> python "C:\Python27\Scripts\kernprof.py" -l -v surprising-challenge-generating-comprehensive-python-list.py
Wrote profile results to example.py.lprof
Timer unit: 3.2079e-07 s
File: surprising-challenge-generating-comprehensive-python-list.py
Function: join_adjacent_repeated_materials at line 3
Total time: 0.000805183 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
3 @profile
4 def join_adjacent_repeated_materials(potential_structure):
5 """
6 Self-explanitory...
7 """
8 #print potential_structure
9
10 30 175 5.8 7.0 new_layers = [] # List to hold re-cast structure
11 100 544 5.4 21.7 for layer in potential_structure:
12 70 416 5.9 16.6 if len(new_layers) > 0: # if not the first item in the list of layers
13 41 221 5.4 8.8 last_layer=new_layers[-1] # last element of existing layer list
14 41 248 6.0 9.9 if layer[0] == last_layer[0]: # true is the two layers are the same material
15 30 195 6.5 7.8 combined_layer = (layer[0], layer[1] + last_layer[1])
16 30 203 6.8 8.1 new_layers[len(new_layers)-1] = combined_layer
17 else: # adjcent layers are different material so no comibantion is possible
18 11 68 6.2 2.7 new_layers.append(layer)
19 else: # for the first layer
20 29 219 7.6 8.7 new_layers.append(layer)
21
22 30 221 7.4 8.8 return tuple(new_layers)
File: surprising-challenge-generating-comprehensive-python-list.py
Function: calculate_unique_structure_lengths at line 24
Total time: 3767.94 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
24 @profile
25 def calculate_unique_structure_lengths(thicknesses, materials, maximum_number_of_layers,\
26 maximum_individual_layer_thicknesses, \
27 maximum_total_material_thicknesses):
28 """
29 Create a set on all possible multilayer combinations.
30
31 thicknesses : if this contains '0' the total number of layers will vary
32 from 0 to maximum_number_of_layers, otherwise, the
33 number total number layers will always be maximum_number_of_layers
34 e.g. arange(0 , 100, 5)
35
36 materials : list of materials used
37 e.g. ['Metal', 'Dielectric']
38
39 maximum_number_of_layers : pretty self-explanitory...
40 e.g. 5
41
42 maximum_individual_layer_thicknesses : filters the created the multilayer structures
43 preventing the inclusion layers that are too thick
44 - this is important after the joining of
45 adjacent materials
46 e.g. (('Metal',30),('Dielectric',20))
47
48 maximum_total_material_thicknesses : similar to the above but filters structures where the total
49 amount of a particular material is exceeded
50 e.g. (('Metal',50),('Dielectric',100))
51
52
53 """
54 # generate all possible thickness combinations and material combinations
55 1 20305240 20305240.0 0.2 all_possible_thickness_sets = set(permutations(combinations_with_replacement(thicknesses, maximum_number_of_layers)))
56 1 245 245.0 0.0 all_possible_layer_material_orders = set(permutations(combinations_with_replacement(materials, maximum_number_of_layers)))
57
58
59 1 13 13.0 0.0 first_set = set() # Create set object (list of unique elements, no repeats)
60 25 235 9.4 0.0 for layer_material_order in all_possible_layer_material_orders:
61 87091224 896927052 10.3 7.6 for layer_thickness_set in all_possible_thickness_sets:
62 87091200 920048586 10.6 7.8 potential_structure = [] # list to hold this structure
63 348364800 4160332176 11.9 35.4 for layer, thickness in zip(layer_material_order[0], layer_thickness_set[0]): # combine the layer thickness with its material
64 261273600 2334038439 8.9 19.9 if thickness != 0: # layers of zero thickness are not added to potential_structure
65 174182400 2003639625 11.5 17.1 potential_structure.append((layer, thickness))
66 87091200 1410517427 16.2 12.0 first_set.add(tuple(potential_structure)) # add this potential_structure to the first_set set
67
68 #print('first_set')
69 #for struct in first_set:
70 # print struct
71
72 ## join adjacent repeated materials
73 1 14 14.0 0.0 second_set = set() # create new set
74 31 274 8.8 0.0 for potential_structure in first_set:
75 30 5737 191.2 0.0 second_set.add(join_adjacent_repeated_materials(potential_structure))
76
77 ## remove structures where a layer is too thick
78 1 10 10.0 0.0 third_set = set()
79 23 171 7.4 0.0 for potential_structure in second_set: # check all the structures in the set
80 22 164 7.5 0.0 conditions_satisfied=True # default
81 66 472 7.2 0.0 for max_condition in maximum_individual_layer_thicknesses: # check this structure using each condition
82 104 743 7.1 0.0 for layer in potential_structure: # examine each layer
83 60 472 7.9 0.0 if layer[0] == max_condition[0]: # match condition with material
84 30 239 8.0 0.0 if layer[1] > max_condition[1]: # test thickness condition
85 conditions_satisfied=False
86 22 149 6.8 0.0 if conditions_satisfied:
87 22 203 9.2 0.0 third_set.add(potential_structure)
88
89 ##remove structures that contain too much of a certain material
90 1 10 10.0 0.0 fourth_set = set()
91 23 178 7.7 0.0 for potential_structure in second_set: # check all the structures in the set
92 22 158 7.2 0.0 conditions_satisfied=True # default
93 66 489 7.4 0.0 for max_condition in maximum_total_material_thicknesses: # check this structure using each condition
94 44 300 6.8 0.0 amount_of_material_in_this_structure = 0 # initialise a counter
95 104 850 8.2 0.0 for layer in potential_structure: # examine each layer
96 60 2961 49.4 0.0 if layer[0] == max_condition[0]: # match condition with material
97 30 271 9.0 0.0 amount_of_material_in_this_structure += layer[1]
98 30 255 8.5 0.0 if amount_of_material_in_this_structure > max_condition[1]: # test thickness condition
99 conditions_satisfied=False
100 22 160 7.3 0.0 if conditions_satisfied:
101 22 259 11.8 0.0 fourth_set.add(potential_structure)
102
103 1 16 16.0 0.0 return fourth_set
As you can see constructing thefirst_set
incalculate_unique_structure_lengths()
is by far the most time-consuming step.
Upvotes: 0
Reputation: 281538
all_possible_thickness_sets = set(permutations(combinations_with_replacement(thicknesses, maximum_number_of_layers)))
all_possible_layer_material_orders = set(permutations(combinations_with_replacement(materials, maximum_number_of_layers)))
Holy crap! These sets are going to be huge! Let's give an example. If thicknesses
has 6 things in it and maximum_number_of_layers
is 3, then the first set is going to have about 2 quintillion things in it. Why are you doing this? If these are really the sets you want to use, you're going to need to find an algorithm that doesn't need to build these sets, because it's never going to happen. I suspect these aren't the sets you want; perhaps you wanted itertools.product
?
all_possible_thickness_sets = set(product(thicknesses, repeat=maximum_number_of_layers))
Here's an example of what itertools.product
does:
>>> for x in product([1, 2, 3], repeat=2):
... print x
...
(1, 1)
(1, 2)
(1, 3)
(2, 1)
(2, 2)
(2, 3)
(3, 1)
(3, 2)
(3, 3)
Does that look like what you need?
Upvotes: 2
Reputation: 5144
So one thing you do really often is to add something to a set
. If you look at the runtime behaviour of sets in the Python documentation, it says, quite on the bottom, about worst cases "Individual actions may take surprisingly long, depending on the history of the container". I think memory reallocation may bite you if you add a lot of elements, because Python has no way of knowing how much memory to reserve when you start.
The more I look at it, the more I think you're reserving more memory than you have to. third_set
for example doesn't even get used. second_set
could be replaced by first_set
if you'd just call join_adjacent_materials
directly. And if I read it correctly, even first_set
could go away and you could just create the candidates as you construct fourth_set
.
Of course the code may become less readable if you put everything into a single bunch of nested loops. However, there are ways to structure your code that don't create unnecessary objects just for readability - you could for example create a function which generates candidates and return each result via yield
.
Upvotes: 0