Reputation: 2147
I have a list which entries are numpy arrays (2D in this case). Example data:
x=list([np.array([[1,2,3],[11,12,13],[111,112,113]]),np.array([[4,5,6],[14,15,16],[114,115,116],[1114,1115,1116]]),np.array([[11,12,13],[111,112,113]]),np.array([[7,8,9],[17,18,19],[117,118,119],[1117,1118,1119]])])
I want to execute functions on each column of each numpy array separate. Some functions have that axis command built in but some not e.g. MinMaxScaler.
so far I created this list-comprehension:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
Data=list()
Data=[[(scaler.fit_transform(np.reshape(x[i][:,j],(-1,1)))) for j in range(x[i].shape[1])] for i in range(len(x))]
The problem here is that the list comprehension creates a new list with one 1D- numpy array per iteration.
I tried to use hstack and iterate over the list length.
Data=list()
L=list(range(len(x)))
for k in range(len(x)):
L[k]=np.zeros([x[k].shape[0],x[k].shape[1]])
Data=[[np.hstack((L[i],scaler.fit_transform(np.reshape(x[i][:,j],(-1,1))))) for j in range(x[i].shape[1])] for i in range(len(x))]
But that works not at all. Of course, it stacks on top of the existing zeroes in L and it creates another list per iteration.
Other initiations of L did not work even if that is not the main problem:
L=list() #IndexError: list index out of range
L=list(None)*len(x) #TypeError: 'NoneType' object is not iterable
L=list(range(len(x))) #ValueError: all the input arrays must have same number of dimensions
#...and others tried
Does anyone have an idea how to solve this or do I have to do this with the classic for loops?
Thanks for your help
Upvotes: 0
Views: 50
Reputation: 2147
I found the answer. It is probably not the sexiest one it works. If anyone can translate it into a more pythonic way with list comprehension it would be appreciated but not necessary.
with x:
x=list([np.array([[1,2,3],[11,12,13],[111,112,113]]),np.array([[4,5,6],[14,15,16],[114,115,116],[1114,1115,1116]]),np.array([[11,12,13],[111,112,113]]),np.array([[7,8,9],[17,18,19],[117,118,119],[1117,1118,1119]])])
Version with function, which is interchangeable:
def theFunction(values,f):
values=f.fit_transform(np.reshape(values,(-1,1)))
return values
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1)) #define function
data =[0]*len(Neonate)
for matrix,i in zip(x,range(len(x))): # iterate through every matrix in the list
for column in matrix.transpose(): # iterate through every column in the matrix
col=theFunction(column,scaler)
if 'Matrx' in locals():
Matrx=np.hstack((Matrx,col))
else:
Matrx=col
data[i]=Matrx
del Matrx
without function where you define what to do within the loop itselve:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1)) #define function
data =[0]*len(Neonate)
for matrix,i in zip(x,range(len(x))): # iterate through every matrix in the list
for column in matrix.transpose(): # iterate through every column in the matrix
col=scaler.fit_transform(np.reshape(column,(-1,1)))
if 'Matrx' in locals():
Matrx=np.hstack((Matrx,col))
else:
Matrx=col
data[i]=Matrx
del Matrx
return babies, AnnotMatrix_each_patient, FeatureMatrix_each_patient_all
Upvotes: 0
Reputation: 231385
With your x
(thanks for making it cut-n-paste friendly):
In [291]: x=list([np.array([[1,2,3],[11,12,13],[111,112,113]]),np.array([[4,5,6
...: ],[14,15,16],[114,115,116],[1114,1115,1116]]),np.array([[11,12,13],[11
...: 1,112,113]]),np.array([[7,8,9],[17,18,19],[117,118,119],[1117,1118,111
...: 9]])])
In [292]: x
Out[292]:
[array([[ 1, 2, 3],
[ 11, 12, 13],
[111, 112, 113]]), array([[ 4, 5, 6],
[ 14, 15, 16],
[ 114, 115, 116],
[1114, 1115, 1116]]), array([[ 11, 12, 13],
[111, 112, 113]]), array([[ 7, 8, 9],
[ 17, 18, 19],
[ 117, 118, 119],
[1117, 1118, 1119]])]
In [293]: len(x)
Out[293]: 4
In [294]: [i.shape for i in x]
Out[294]: [(3, 3), (4, 3), (2, 3), (4, 3)]
I haven't tried to digest your intended processing, but since the arrays have different shapes, I don't see how you can avoid processing each separately. They can't be combined into any sort of higher dimensional array.
I'm not going to try to apply fit.transform
, but it is apparent that Data
is a list of lists. I don't know what those inner lists contain.
May be it would help if you described the problem, possibly in a simplified form, with just one element of the x
list. I prefer to run a concrete example, and look at the resulting arrays and lists in my own Python session. Word descriptions just aren't clear enough.
Upvotes: 0
Reputation: 12157
This should work (if i've understood correctly)
def f(column):
... # function you want to apply to each column
data = [f(column) for matrix in x for column in matrix.T]
It's a double for loop, equivalent to (but faster than)
data = []
for matrix in x: # iterate through every matrix in the list
for column in matrix.transpose(): # iterate through every column in the matrix
data.append(f(column))
Upvotes: 1