Reputation: 675
I have a numpy array of numbers as follows:
array([[ 0.00365172, -0.01862929, 0.00739219, ..., -0.05520727,
-0.00388453, -0.00591132],
[ 0.00084692, -0.0177305 , 0.00618157, ..., -0.05275924,
-0.00323982, -0.00107789],
[ 0.01276451, -0.00361472, 0.0008607 , ..., 0.00464235,
0.00075972, 0.00700309]], dtype=float32)
What is the most efficient way to store this on disk such that each array item is encoded in some text label as follows:
array(['LabelA'
'LabelB',
'LabelC'])
In other words, LabelA, LabelB and LabelC are string representations of the corresponding number vectors. My goal is to store this kind of array in a human readable format and I don't really care about reading the numerical values, yet preserving their one-to-one relationship with the corresponding unique labels.
Is something like this possible? Thanks.
Upvotes: 0
Views: 41
Reputation: 221584
It seems you are trying to assign labels to each row based on uniqueness among themselves. So, you can use np.unique
with its axis
argument to pack each row as one item and return_inverse
for ID-ing each such item based on their uniqueness, like so -
In [42]: a = np.array([[3,5,8,2],[4,1,5,2],[3,5,8,2]])
In [43]: unique_ids = np.unique(a,axis=0, return_inverse=1)[1]
In [44]: unique_ids
Out[44]: array([0, 1, 0])
In [45]: ['Label'+str(i) for i in unique_ids]
Out[45]: ['Label0', 'Label1', 'Label0']
For upto 26
unique labels, we can surely use capital letters
-
In [50]: import string
In [51]: ['Label'+string.uppercase[i] for i in unique_ids]
Out[51]: ['LabelA', 'LabelB', 'LabelA']
Upvotes: 4