Daniel
Daniel

Reputation: 101

How to create a matrix based on bins?

I have a set of values ranging from 3-27, that have 20 finite values:

A = [(0,21),(1,12),(2,15),(3,3),(4,21),(5,15),(6,27),(7,21),(8,9),(9,27),(10,12),(11,9),(12,12),(13,3),(14,9),(15,12),(16,6),(17,3),(18,9),(19,15)]

I would like to learn how to create a numpy array that would have 9 bins, each one having a range of -1 and +1 integer for the given tuples[1], the multiples of 3 from the range of 3-27 (but this should be interchangeable to any other combination of integers and range). In the end, I'd like to be able to create a matrix that looks something like this:

[[0,0,0,0,0,0,1,0,0],
 [0,0,0,1,0,0,0,0,0],
 [0,0,0,0,1,0,0,0,0],
 [1,0,0,0,0,0,0,0,0],
 [0,0,0,0,0,0,1,0,0],
 [0,0,0,0,1,0,0,0,0],
 [0,0,0,0,0,0,0,0,1],
 ....]

I was reading on how numpy has (num, bins) = histogram(x, bins=None, range=None) but I'm not quite sure how to go by that.

I was thinking that I would have to iterate through 'A' to get the unique values('a') and then do the range by (a-1,a+1), to get the number of bins I would just to len(unique_values). But then I'm lost. Can anyone guide me?

Upvotes: 2

Views: 96

Answers (1)

Divakar
Divakar

Reputation: 221574

Here's one way with np.searchsorted/np.digitize -

bins = np.arange(3,28,3)
ar = np.asarray(A)[:,1] # or np.array([i[1] for i in A])
ids = np.searchsorted(bins, ar) # or np.digitize(ar,bins)-1
out = (ids[:,None] == np.arange(9)).astype(int)

The last step to get the final output could be replaced by array-initialization -

out = np.zeros((len(ids), 9),dtype=int)
out[np.arange(len(ids)), ids] = 1

If the first element in the tuples were not in sequence, we might want to use those to index into rows -

out[np.asarray(A)[:,0], ids] = 1

Sample run -

In [205]: A
Out[205]: 
[(0, 21),
 (1, 12),
 (2, 15),
 (3, 3),
 (4, 21),
 (5, 15),
 (6, 27),
 (7, 21),
 (8, 9),
 (9, 27),
 (10, 12),
 (11, 9),
 (12, 12),
 (13, 3),
 (14, 9),
 (15, 12),
 (16, 6),
 (17, 3),
 (18, 9),
 (19, 15)]

In [206]: out[:7] # first 7 rows of output
Out[206]: 
array([[0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1]])

Upvotes: 2

Related Questions