D.prd
D.prd

Reputation: 675

How to preserve encoded numpy array on disk?

I have a numpy array of numbers as follows:

array([[ 0.00365172, -0.01862929,  0.00739219, ..., -0.05520727,
        -0.00388453, -0.00591132],
       [ 0.00084692, -0.0177305 ,  0.00618157, ..., -0.05275924,
        -0.00323982, -0.00107789],
       [ 0.01276451, -0.00361472,  0.0008607 , ...,  0.00464235,
         0.00075972,  0.00700309]], dtype=float32)

What is the most efficient way to store this on disk such that each array item is encoded in some text label as follows:

array(['LabelA'   
       'LabelB',
       'LabelC']) 

In other words, LabelA, LabelB and LabelC are string representations of the corresponding number vectors. My goal is to store this kind of array in a human readable format and I don't really care about reading the numerical values, yet preserving their one-to-one relationship with the corresponding unique labels.

Is something like this possible? Thanks.

Upvotes: 0

Views: 41

Answers (1)

Divakar
Divakar

Reputation: 221584

It seems you are trying to assign labels to each row based on uniqueness among themselves. So, you can use np.unique with its axis argument to pack each row as one item and return_inverse for ID-ing each such item based on their uniqueness, like so -

In [42]: a = np.array([[3,5,8,2],[4,1,5,2],[3,5,8,2]])

In [43]: unique_ids = np.unique(a,axis=0, return_inverse=1)[1]

In [44]: unique_ids
Out[44]: array([0, 1, 0])

In [45]: ['Label'+str(i) for i in unique_ids]
Out[45]: ['Label0', 'Label1', 'Label0']

For upto 26 unique labels, we can surely use capital letters -

In [50]: import string

In [51]: ['Label'+string.uppercase[i] for i in unique_ids]
Out[51]: ['LabelA', 'LabelB', 'LabelA']

Upvotes: 4

Related Questions