Dhruv Ghulati
Dhruv Ghulati

Reputation: 3036

Check value of a 2d array for different column indices in each row of 2d array

I have some a binary 2D numpy array (prediction) like:

[
[1 0 1 0 1 1],
[0 0 1 0 0 1],
[1 1 1 1 1 0],
[1 1 0 0 1 1],
]

Each row in the 2D array is the classification of a sentence as being certain categories, and each column in the 2D array corresponds to the classification of a category for that sentence. The categories (categories array) are ['A','B','C','D','E','F'] as an example.

I have another 2D array (catIndex) which contains the index of the value to check in each row, e.g.

[[0],
  [4],
  [5],
  [2]
] 

for the 4 instances above.

What I want to do is now loop through the binary 2D array and for the column index specified for each sentence, check if it is a 1 or a 0, and then append the corresponding category to a new array (catResult = []). If it is a 0, I would append "no_region" to the new array.

So for example, in sentence 1, I look at index 0 of the sentence, and check if it is a 0 or 1. It is a 1, so I append 'A' to my new array. In sentence 2, I look at index 4 of the sentence, and see it is a 0, so I append "no_region" to the array.

Current code:

for index in catIndex:
        for i,sentence in enumerate(prediction):
            for j,binaryLabel in enumerate(sentence):
                if prediction[i][index]==1:
                    catResult.append(categories[index])
                else:
                    catResult.append("no_region")

Upvotes: 1

Views: 798

Answers (2)

hpaulj
hpaulj

Reputation: 231738

Make the 2d array:

In [54]: M=[[1,0,1,0,1,1],[0,0,1,0,0,1],[1,1,1,1,1,0],[1,1,0,0,1,1]]
In [55]: M=np.array(M)

Column index with ind, with [0,1,2,3] as the row index:

In [56]: ind=[0,4,5,2]    
In [57]: m=M[np.arange(len(ind)),ind]
In [58]: m
Out[58]: array([1, 0, 0, 0])

Map labels with ind:

In [59]: lbl=np.array(list('ABCDEF'),dtype=object)    
In [60]: res=lbl[ind]
In [61]: res
Out[61]: array(['A', 'E', 'F', 'C'], dtype=object)

Use where to determine whether that mapped value is used, or some None. Use of object dtype allows for easy replacement of a string label with something else, None or no_region, etc.

In [62]: np.where(m, res, None)
Out[62]: array(['A', None, None, None], dtype=object)

Upvotes: 1

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10769

Something along these lines should do it efficiently, though not in a position to test right now:

rows = len(prediction)
p = prediction[np.arange(rows), catIndex.flatten()]
catResult = np.empty(rows, 'S1').fill('n')
catResult[p] = categories[catIndex.flatten()][p]

Upvotes: 0

Related Questions