Florida Man
Florida Man

Reputation: 2147

Operation on 2D nd.array colums organized as a list

I have a list which entries are numpy arrays (2D in this case). Example data:

   x=list([np.array([[1,2,3],[11,12,13],[111,112,113]]),np.array([[4,5,6],[14,15,16],[114,115,116],[1114,1115,1116]]),np.array([[11,12,13],[111,112,113]]),np.array([[7,8,9],[17,18,19],[117,118,119],[1117,1118,1119]])])

I want to execute functions on each column of each numpy array separate. Some functions have that axis command built in but some not e.g. MinMaxScaler.

so far I created this list-comprehension:

   from sklearn.preprocessing import MinMaxScaler
   scaler = MinMaxScaler(feature_range=(0, 1))
   Data=list()
   Data=[[(scaler.fit_transform(np.reshape(x[i][:,j],(-1,1)))) for j in range(x[i].shape[1])] for i in range(len(x))]     

The problem here is that the list comprehension creates a new list with one 1D- numpy array per iteration.

I tried to use hstack and iterate over the list length.

   Data=list()
   L=list(range(len(x)))
   for k in range(len(x)):
          L[k]=np.zeros([x[k].shape[0],x[k].shape[1]])

   Data=[[np.hstack((L[i],scaler.fit_transform(np.reshape(x[i][:,j],(-1,1))))) for j in range(x[i].shape[1])] for i in range(len(x))]   

But that works not at all. Of course, it stacks on top of the existing zeroes in L and it creates another list per iteration.

Other initiations of L did not work even if that is not the main problem:

   L=list() #IndexError: list index out of range
   L=list(None)*len(x) #TypeError: 'NoneType' object is not iterable
   L=list(range(len(x))) #ValueError: all the input arrays must have same number of dimensions
   #...and others tried

Does anyone have an idea how to solve this or do I have to do this with the classic for loops?

Thanks for your help

Upvotes: 0

Views: 50

Answers (3)

Florida Man
Florida Man

Reputation: 2147

I found the answer. It is probably not the sexiest one it works. If anyone can translate it into a more pythonic way with list comprehension it would be appreciated but not necessary.

with x:

   x=list([np.array([[1,2,3],[11,12,13],[111,112,113]]),np.array([[4,5,6],[14,15,16],[114,115,116],[1114,1115,1116]]),np.array([[11,12,13],[111,112,113]]),np.array([[7,8,9],[17,18,19],[117,118,119],[1117,1118,1119]])])

Version with function, which is interchangeable:

   def theFunction(values,f):
           values=f.fit_transform(np.reshape(values,(-1,1)))
           return values

   from sklearn.preprocessing import MinMaxScaler
   scaler = MinMaxScaler(feature_range=(0, 1)) #define function 
   data =[0]*len(Neonate)

   for matrix,i in zip(x,range(len(x))):  # iterate through every matrix in the list           
       for column in matrix.transpose():  # iterate through every column in the matrix
           col=theFunction(column,scaler) 
           if 'Matrx' in locals():
                  Matrx=np.hstack((Matrx,col)) 
           else:
                  Matrx=col  
       data[i]=Matrx 
       del Matrx

without function where you define what to do within the loop itselve:

   from sklearn.preprocessing import MinMaxScaler    
   scaler = MinMaxScaler(feature_range=(0, 1)) #define function 
   data =[0]*len(Neonate)

   for matrix,i in zip(x,range(len(x))):  # iterate through every matrix in the list           
       for column in matrix.transpose():  # iterate through every column in the matrix
           col=scaler.fit_transform(np.reshape(column,(-1,1)))
           if 'Matrx' in locals():
                  Matrx=np.hstack((Matrx,col)) 
           else:
                  Matrx=col  
       data[i]=Matrx 
       del Matrx              
   return babies, AnnotMatrix_each_patient, FeatureMatrix_each_patient_all

Upvotes: 0

hpaulj
hpaulj

Reputation: 231385

With your x (thanks for making it cut-n-paste friendly):

In [291]:  x=list([np.array([[1,2,3],[11,12,13],[111,112,113]]),np.array([[4,5,6
     ...: ],[14,15,16],[114,115,116],[1114,1115,1116]]),np.array([[11,12,13],[11
     ...: 1,112,113]]),np.array([[7,8,9],[17,18,19],[117,118,119],[1117,1118,111
     ...: 9]])])
In [292]: x
Out[292]: 
[array([[  1,   2,   3],
        [ 11,  12,  13],
        [111, 112, 113]]), array([[   4,    5,    6],
        [  14,   15,   16],
        [ 114,  115,  116],
        [1114, 1115, 1116]]), array([[ 11,  12,  13],
        [111, 112, 113]]), array([[   7,    8,    9],
        [  17,   18,   19],
        [ 117,  118,  119],
        [1117, 1118, 1119]])]
In [293]: len(x)
Out[293]: 4
In [294]: [i.shape for i in x]
Out[294]: [(3, 3), (4, 3), (2, 3), (4, 3)]

I haven't tried to digest your intended processing, but since the arrays have different shapes, I don't see how you can avoid processing each separately. They can't be combined into any sort of higher dimensional array.

I'm not going to try to apply fit.transform, but it is apparent that Data is a list of lists. I don't know what those inner lists contain.

May be it would help if you described the problem, possibly in a simplified form, with just one element of the x list. I prefer to run a concrete example, and look at the resulting arrays and lists in my own Python session. Word descriptions just aren't clear enough.

Upvotes: 0

FHTMitchell
FHTMitchell

Reputation: 12157

This should work (if i've understood correctly)

def f(column):
    ... # function you want to apply to each column

data = [f(column) for matrix in x for column in matrix.T]

It's a double for loop, equivalent to (but faster than)

data = []
for matrix in x:  # iterate through every matrix in the list
    for column in matrix.transpose():  # iterate through every column in the matrix
        data.append(f(column))

Upvotes: 1

Related Questions