Reputation: 765
As a minimal working example, I have a file.txt containing a list of numbers:
1.1
2.1
3.1
4.1
5.1
6.1
7.1
8.1
which actually should be presented with indices that makes it a 3D array
0 0 1.1
1 0 2.1
0 1 3.1
1 1 4.1
0 2 5.1
1 2 6.1
0 3 7.1
1 3 8.1
I want to import the 3D array into python and have been using bash to generate the indices and then pasting the index to file.txt before importing the resulting full.txt in python using pandas:
for ((y=0;y<=3;y++)); do
for ((x=0;x<=1;x++)); do
echo -e "$x\t$y"
done
done
done > index.txt
paste index.txt file.txt> full.txt
The writing of index.txt has been slow in my actual code, which has x up to 9000 and y up to 5000. Is there a way to generate the indices into the first 2 columns of a 2D python numpy array so I only need to import the data from file.txt as as the third column?
Upvotes: 1
Views: 139
Reputation: 5026
I would recommend using pandas
for loading the data and managing columns with different types.
We can generate the indices with np.indices
with the desired dimensions and reshape
to match your format.
Then concatenate 'file.txt'.
Creating the index for (9000,5000)
takes about 950ms
on a colab instance.
import numpy as np
import pandas as pd
x,y = 2,4 # dimensions, also works with 9000,5000 but assumes 'file.txt' has the correct size
pd.concat([
pd.DataFrame(np.indices((x,y)).ravel('F').reshape(-1,2), columns=['ind1','ind2']),
pd.read_csv('file.txt', header=None, names=['Value'])
], axis=1)
Out:
ind1 ind2 Value
0 0 0 1.1
1 1 0 2.1
2 0 1 3.1
3 1 1 4.1
4 0 2 5.1
5 1 2 6.1
6 0 3 7.1
7 1 3 8.1
First create the indices for your desired dimensions with np.indices
np.indices((2,4))
Out:
array([[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[0, 1, 2, 3],
[0, 1, 2, 3]]])
Which gives us the right indices but in the wrong order.
With np.ravel('F')
we can specify to flatten
the array in columns first order
np.indices((2,4)).ravel('F')
Out:
array([0, 0, 1, 0, 0, 1, 1, 1, 0, 2, 1, 2, 0, 3, 1, 3])
To get the desired columns reshape
into a 2D array with shape (8,2)
. With (-1,2)
the first dimension is inferred.
np.indices((2,4)).ravel('F').reshape(-1,2)
Out:
array([[0, 0],
[1, 0],
[0, 1],
[1, 1],
[0, 2],
[1, 2],
[0, 3],
[1, 3]])
Then convert into a dataframe
with columns ind1
and ind2
.
pd.DataFrame(np.indices((2,4,3)).ravel('F').reshape(-1,3)).add_prefix('ind')
Out:
ind0 ind1 ind2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 2 0
5 1 2 0
6 0 3 0
7 1 3 0
8 0 0 1
9 1 0 1
10 0 1 1
11 1 1 1
12 0 2 1
13 1 2 1
14 0 3 1
15 1 3 1
16 0 0 2
17 1 0 2
18 0 1 2
19 1 1 2
20 0 2 2
21 1 2 2
22 0 3 2
23 1 3 2
Upvotes: 2
Reputation: 1304
If you want to stick to your bash then you can avoid two loops:
Code:
for ((y=0;y<=3;y++)); do
echo -e "0\t$y\n1\t$y"
done
Output:
0 0
1 0
0 1
1 1
0 2
1 2
0 3
1 3
above in python is:
Code:
for y in range(4):
print(f'0\t{y}\n1\t{y}')
Output:
0 0
1 0
0 1
1 1
0 2
1 2
0 3
1 3
Upvotes: 0
Reputation: 1841
Here is a quick example how to create the 3D array from a 1D array. As a dummy i have random numbers. Then it creates tuples of x,y,value.
It takes about a minute for 45M rows
from random import randrange
x = 5000
y = 9000
numbers = [randrange(100000,999999) for i in range(x*y)]
array = [(a,b, numbers[b*(x-1)+a]) for a in range(x) for b in range(y)]
Output
pd.DataFrame(array)
Out[23]:
0 1 2
0 0 0 878704
1 0 1 524573
2 0 2 943657
3 0 3 496507
4 0 4 802714```
Upvotes: 0