Reputation: 239
I need to append the arrays created in each loop so that I get a single ndarray at the end. The code structure is like this:
for...:
.
.
.
for...:
list1 = array([some_math_here])
list2.append(list1)
#each loop creats a list, converting it to array() gives different shaped arrays:
array(list2).shape
(2939, 4)
(2942, 4)
(2027, 4)
(2030, 4)
#list3 collects all the generated results
list3.append(list2)
Q:
How can I have an array instead of list3, with n*4
columns and different number of rows?
I tried by creating an empty array a = array([0.,1.])
and then append(a,array(list_2))
but doesn't work. I'm aware of hstack
vstack
etc, but cannot make use of them together with append
in the loop. Any advice how?
UPDATE Here's the actual code with output from suggested methods:
files_ = glob.glob('D:\Test files\*.txt')
tfile_ = loadtxt('times.txt')
averages_, d = [], []
with open ('outfile.csv', 'wb') as outfile:
writer = csv.writer(outfile)
for fcount_, fname_ in enumerate(files_):
data = loadtxt(fname_ , usecols = (1,2,3,4))
average_, fcol = [], []
seg_len = 3
for x in range(0, len(data[:,0]), seg_len):
sample_means = [mean(data[x:x+seg_len,i]) for i in range(4)]
none_zeros = [x if x >= 0 else x == 0 for x in sample_means]
average_.append(none_zeros)
fcol = cumsum(array(average_)[:,0])
average_ = array([row + [col] for row, col in zip(average_, fcol)])
averages_.append(average_)
d = concatenate(array(averages_))
df = pd.DataFrame(d)
df.to_csv('pdtest2.csv')
output:
0 1 2 3 4
0 0.037039 0.103792 0.136116 0.579297 0.037039
1 0.051183 0.104669 0.177728 0.593771 0.088222
2 0.059517 0.105437 0.174274 0.571402 0.147739
3 0.053212 0.102476 0.167530 0.645745 0.200950
4 0.054637 0.104450 0.165228 0.596622 0.054637
5 0.051622 0.101161 0.166708 0.595964 0.106259
6 0.057324 0.099077 0.168024 0.596841 0.163583
7 0.054692 0.103573 0.157168 0.598596 0.218275
8 0.066699 0.100612 0.145984 0.591578 0.284974
9 0.120866 5.527104 4.678589 2.401020 0.120866
10 0.113958 5.176220 4.669872 2.361985 0.234824
11 0.121469 4.879613 4.659017 2.359573 0.356293
12 0.122511 4.695618 4.642240 2.363959 0.478803
13 0.126650 4.621933 4.620447 2.347073 0.605453
14 0.132708 4.676868 4.517568 2.364617 0.132708
15 0.125087 4.693535 4.459672 2.381941 0.257795
16 0.132708 4.715246 4.444705 2.334353 0.390503
17 0.133476 4.745619 4.406300 2.317467 0.523979
while I want :
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 0.037038522 0.103792144 0.136115724 0.57929719 0.037038522 0.054637318 0.104450043 0.16522775 0.596621864 0.054637318 0.12086581 5.527104488 4.678589189 2.401020431 0.12086581 0.132707991 4.67686799 4.517567512 2.364616645 0.132707991
1 0.051183348 0.104669343 0.177727829 0.593770968 0.08822187 0.051621948 0.101160549 0.166708023 0.595963965 0.106259265 0.113957871 5.176219782 4.669871979 2.361985046 0.234823681 0.125087328 4.693534961 4.459672089 2.381941338 0.257795319
2 0.059516735 0.105436892 0.17427386 0.571402402 0.147738605 0.057323738 0.099077202 0.168023821 0.596841163 0.163583003 0.121468884 4.879613015 4.659016582 2.359572747 0.356292565 0.132707991 4.715245885 4.444704808 2.334353258 0.39050331
3 0.05321187 0.102476346 0.167530397 0.645744989 0.200950475 0.054692143 0.103572845 0.157168489 0.598595561 0.218275146 0.122510557 4.695618334 4.642240062 2.363958746 0.478803122 0.13347554 4.745619253 4.406299754 2.317467166 0.52397885
4 0 0 0 0 0 0.066698797 0.1006123 0.145984208 0.591577971 0.284973943 0.126649838 4.621932787 4.620447035 2.347072653 0.60545296 0 0 0 0 0
Upvotes: 3
Views: 7474
Reputation: 54340
No, you can't create a n*4
2d array
if n
for each column is different:
>>> np.vstack((np.arange(10),np.arange(1,11),np.arange(2,12)))
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]])
>>> np.vstack((np.arange(10),np.arange(0,11),np.arange(0,12)))
Traceback (most recent call last):
File "<pyshell#36>", line 1, in <module>
np.vstack((np.arange(10),np.arange(0,11),np.arange(0,12)))
File "C:\Python27\lib\site-packages\numpy\core\shape_base.py", line 226, in vstack
return _nx.concatenate(map(atleast_2d,tup),0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
See the ValueError
, when the dimension of each array
is different.
You either has to stay with list
for list3
or fill each list2
to equal length.
For higher dimension, the same rule applies: np.vstack((np.ones((10,4)),np.ones((10,6)),np.ones((10,6))))
won't work, but np.vstack((np.ones((10,4)),np.ones((11,4)),np.ones((12,4))))
will and create a 35*4 array
.
In your case, if you vstack
your list2
s, you will get get a 9938*4 array
, if that is what you want. (I don't get the different number of rows part)
EDIT:
To pad the shorter arrays
so that every array
has the same shape
, you need:np.lib.pad
>>> b=np.random.randint(0,20, size=(12,4))
>>> np.lib.pad(b, ((0,3),(0,0)), 'constant', constant_values=[0.])
array([[ 5, 2, 10, 7],
[ 7, 17, 8, 11],
[ 7, 7, 2, 10],
[16, 17, 15, 16],
[ 0, 19, 5, 6],
[18, 19, 18, 6],
[ 2, 8, 11, 19],
[ 3, 17, 18, 16],
[10, 1, 12, 11],
[ 0, 7, 1, 14],
[ 7, 17, 8, 16],
[12, 6, 3, 5],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]])
((0,3),(0,0))
means to pad 3 elements in the end of the first axis and pad 0 elements in the beginning of it. Also, it means to pad nothing in the 2nd axis. In your case you need to ((0,max_length-length_of_current_array),(0,0))
.
Then you just stack them all up using np.hstack
.
But in my opinion you may want to pad nan
instead of 0.
. 0.
may be meaningful data value.
Upvotes: 3
Reputation: 249153
I think what you want is to make a plain Python list of NumPy arrays. Each array will have the same column count and types, but perhaps different numbers of rows. So you should start with simply list3 = []
and then do list3.append(arr)
N times. Then at the end, if you want one big NumPy array with all the rows, just do np.concatenate(list3)
to put them all together at once. This is more efficient than trying to concatenate NumPy arrays in the loop.
Upvotes: 1