Cynthia Flores
Cynthia Flores

Reputation: 21

Iterate through ndarray in Python

I have a ndarray with dimensions (720,100,100). This ndarray represents all the months within 60 years, this means that it is possible to group them by 12 months per year. 720 represents the months from the year 1958 to 2017 (including 1958), while (100, 100) represent rows and columns with data. Thus, the first twelve ndarray belong to 1958, the second pack of twelve belong to 1959 and so on.Worth to mention is that some of the "cells" in these rows and columns are empty. Thus, the ndarray is saved as a masked array (not a problem as all of the data it's like this).

The problem I am experiencing is this: how can I iterate through this ndarray and "pack" all january, february, march and so on together? So, the new ndarray will have the same shape (720,100,100) but the first 12 arrays will belong to January, the next 12 to February and so on, of all years.

I do not add a code because it is a complete thrash for this operation.

Upvotes: 1

Views: 1380

Answers (3)

Daniel
Daniel

Reputation: 263

Following what @juanpa.arrivillaga already has posted, you can iterate through the array with the following function.

import numpy as np

def dataMonth (_data, _months):
    """ This function returns the original data agrouped by months through the years."""

    _ind_months = []

    _number_months = list(range(_months))

    _grouped_months  = np.concatenate([_data[i::12] for i in range(12)]) #Credits to juanpa.arrivillaga

    for i in _number_months:
        #print (i)
        _temp = _data[i::12]
        _ind_months.append(_temp)

    return _grouped_months, _ind_months

Upvotes: 0

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96349

So, notice, if you wanted all Januaries, you could use numpy.ndarray slicing on the first dimension thusly:

jans = arr[::12]

And all the Febuaries:

febs = arr[1::12]

So, if you really want an array "grouped by months", the simple thing would be (using an array with only 10 years worth of data just to make things a bit more simple):

>>> import numpy as np
>>> x = np.arange(120*100*100, dtype=np.int32).reshape(120, 100, 100)
>>> grouped  = np.concatenate([x[i::12] for i in range(12)]) # O(n) operation!

The nice thing about this is that it will be O(N) time, indeed, it only makes a single (albeit slightly indirect) sweep through original array, since slicing in numpy.ndarray objects produces views. This will also be rather space-efficient, requiring only a bit more than the twice the space (some auxiliary space for the views). Notice:

>>> grouped[:12][0]
array([[   0,    1,    2, ...,   97,   98,   99],
       [ 100,  101,  102, ...,  197,  198,  199],
       [ 200,  201,  202, ...,  297,  298,  299],
       ...,
       [9700, 9701, 9702, ..., 9797, 9798, 9799],
       [9800, 9801, 9802, ..., 9897, 9898, 9899],
       [9900, 9901, 9902, ..., 9997, 9998, 9999]], dtype=int32)
>>> grouped[:12][1]
array([[120000, 120001, 120002, ..., 120097, 120098, 120099],
       [120100, 120101, 120102, ..., 120197, 120198, 120199],
       [120200, 120201, 120202, ..., 120297, 120298, 120299],
       ...,
       [129700, 129701, 129702, ..., 129797, 129798, 129799],
       [129800, 129801, 129802, ..., 129897, 129898, 129899],
       [129900, 129901, 129902, ..., 129997, 129998, 129999]], dtype=int32)
>>> grouped[:12][2]
array([[240000, 240001, 240002, ..., 240097, 240098, 240099],
       [240100, 240101, 240102, ..., 240197, 240198, 240199],
       [240200, 240201, 240202, ..., 240297, 240298, 240299],
       ...,
       [249700, 249701, 249702, ..., 249797, 249798, 249799],
       [249800, 249801, 249802, ..., 249897, 249898, 249899],
       [249900, 249901, 249902, ..., 249997, 249998, 249999]], dtype=int32)
>>> grouped[:12][-1]
array([[130000, 130001, 130002, ..., 130097, 130098, 130099],
       [130100, 130101, 130102, ..., 130197, 130198, 130199],
       [130200, 130201, 130202, ..., 130297, 130298, 130299],
       ...,
       [139700, 139701, 139702, ..., 139797, 139798, 139799],
       [139800, 139801, 139802, ..., 139897, 139898, 139899],
       [139900, 139901, 139902, ..., 139997, 139998, 139999]], dtype=int32)

There might be a more clever way with numpy.reshape, but I'll let the numpy champs around here try to figure that out. The above seems like a decent solution to me.

An alternative you might consider, is a mapping from month number to a view:

month_mapping  = {i:x[i::12] for i in range(12)]} # O(1) operation

Now, this would be very efficient, since the creation of the mapping would be constant time, and you would only need the axulliary space for the dict (a few hundre bytes) and the 12 numpy.ndarray objects that would be views on the original data. If speed of iterating over this is more important, I would go with the above approach, since creating a new array in this form would increase the locality of reference.

Upvotes: 1

jstein123
jstein123

Reputation: 454

Assuming your array a stores the month as an integer from 0-11, something like this would work for grouping the rows into separate arrays by month:

month_arrays = {i:None for i in range(0,12)} #initialize a dictionary for mapping months to arrays
a = ... #your array
for row in a:
    month_index = row[0][0] #get the first element of the row
    if month_arrays[month_index] == None:
        month_arrays[month_index] = row
    else:
        np.vstack([month_arrays[month_index], row])

Upvotes: 0

Related Questions