Reputation: 3974
I am trying to calculate the origin and offset of variable size arrays and store them in a dictionary. Here is the likely non-pythonic way that I am achieving this. I am not sure if I should be looking to use map, a lambda function, or list comprehensions to make the code more pythonic.
Essentially, I need to cut chunks of an array up based on the total size and store the xstart, ystart, x_number_of_rows_to_read, y_number_of_columns_to_read in a dictionary. The total size is variable. I can not load the entire array into memory and use numpy indexing or I definitely would. The origin and offset are used to get the array into numpy.
intervalx = xsize / xsegment #Get the size of the chunks
intervaly = ysize / ysegment #Get the size of the chunks
#Setup to segment the image storing the start values and key into a dictionary.
xstart = 0
ystart = 0
key = 0
d = defaultdict(list)
for y in xrange(0, ysize, intervaly):
if y + (intervaly * 2) < ysize:
numberofrows = intervaly
else:
numberofrows = ysize - y
for x in xrange(0, xsize, intervalx):
if x + (intervalx * 2) < xsize:
numberofcolumns = intervalx
else:
numberofcolumns = xsize - x
l = [x,y,numberofcolumns, numberofrows]
d[key].append(l)
key += 1
return d
I realize that xrange is not ideal for a port to 3.
Upvotes: 6
Views: 2859
Reputation: 309959
This code looks fine except for your use of defaultdict
. A list seems like a much better data structure because:
One thing you could do:
Here's a modified version of your code with my few suggestions.
intervalx = xsize / xsegment #Get the size of the chunks
intervaly = ysize / ysegment #Get the size of the chunks
#Setup to segment the image storing the start values and key into a dictionary.
xstart = 0
ystart = 0
output = []
for y in xrange(0, ysize, intervaly):
numberofrows = intervaly if y + (intervaly * 2) < ysize else ysize -y
for x in xrange(0, xsize, intervalx):
numberofcolumns = intervalx if x + (intervalx * 2) < xsize else xsize -x
lst = [x, y, numberofcolumns, numberofrows]
output.append(lst)
#If it doesn't make any difference to your program, the above 2 lines could read:
#tple = (x, y, numberofcolumns, numberofrows)
#output.append(tple)
#This will be slightly more efficient
#(tuple creation is faster than list creation)
#and less memory hungry. In other words, if it doesn't need to be a list due
#to other constraints (e.g. you append to it later), you should make it a tuple.
Now to get your data, you can do offset_list=output[5]
instead of offset_list=d[5][0]
Upvotes: 7
Reputation: 2804
This is a long one liner :
d = [(x,y,min(x+xinterval,xsize)-x,min(y+yinterval,ysize)-y) for x in
xrange(0,xsize,xinterval) for y in xrange(0,ysize,yinterval)]
Upvotes: 0
Reputation: 68682
Have you considered using np.memmap
to load the pieces dynamically instead? You would then just need to determine the offsets that you need on the fly rather than chunking the array storing the offsets.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html
Upvotes: 0
Reputation: 2440
Although it doesn't change your algorithm, a more pythonic way to write your if/else statements is:
numberofrows = intervaly if y + intervaly * 2 < ysize else ysize - y
instead of this:
if y + (intervaly * 2) < ysize:
numberofrows = intervaly
else:
numberofrows = ysize - y
(and similarly for the other if/else statement).
Upvotes: 0