Reputation: 153
I have a 2D numpy array as follows:
import numpy as np
a=np.array([[1,2],[1,1], [2,1],[2,2],[3,2],[3,2], [3,1], [4,2],[4,1]])
print(a)
I need to count how many values of 1 or 2 occur in column 2 for each value in column 1. For example when x=3 in column 1, there are two instances of the value 2 and one instance of the value 1 in column 2.
Any direction on how to complete this would be appreciated! I think I could do some sort of for loop with np.unique but I am not sure...
Upvotes: 3
Views: 3906
Reputation: 25269
As in your comment, if you want list of lists format, try this:
out = [[k, *np.unique(a[a[:,0] == k,1], return_counts=True)[1]]
for k in np.unique(a[:,0])]
Out[838]: [[1, 1, 1], [2, 1, 1], [3, 1, 2], [4, 1, 1]]
For 2D-array
out = np.array([[k, *np.unique(a[a[:,0] == k,1], return_counts=True)[1]]
for k in np.unique(a[:,0])])
Out[850]:
array([[1, 1, 1],
[2, 1, 1],
[3, 1, 2],
[4, 1, 1]], dtype=int64)
A simple way is using dict comprehension with collections.Counter
and np.unique
from collections import Counter
out = {k: Counter(a[a[:,0] == k,1]) for k in np.unique(a[:,0])}
Out[821]:
{1: Counter({2: 1, 1: 1}),
2: Counter({1: 1, 2: 1}),
3: Counter({2: 2, 1: 1}),
4: Counter({2: 1, 1: 1})}
Upvotes: 1
Reputation: 59731
Assuming your values in the first column go from 1 to N and in the second column from 1 to M, this is one very simple and fast way to do that:
import numpy as np
a = np.array([[1, 2], [1, 1], [2, 1], [2, 2], [3, 2], [3, 2], [3, 1], [4, 2], [4, 1]])
c = np.zeros(a.max(0), np.int32)
np.add.at(c, tuple(a.T - 1), 1)
# c[i, j] contains the number of times
# the second column value is j + 1 when
# the first column value is i + 1
# Print result
for i in range(c.shape[0]):
print(f'Count result for {i + 1}')
for j in range(c.shape[1]):
print(f' Number of {j + 1}s: {c[i, j]}')
Output:
Count result for 1
Number of 1s: 1
Number of 2s: 1
Count result for 2
Number of 1s: 1
Number of 2s: 1
Count result for 3
Number of 1s: 1
Number of 2s: 2
Count result for 4
Number of 1s: 1
Number of 2s: 1
This works simply by making an array c
of zeros and then basically adding one to every row/column of c
indicated by each row of a
. Conceptually, it is equivalent to c[a[:, 0] - 1, a[:, 1] - 1] += 1
. However, doing that will probably not work, because a
contains repeated rows, so NumPy ends up counting only one of those. To do that correctly, you need to use the at
method of the np.add
ufunc (this method is available in other ufuncs too, see Universal functions (ufuncs)). This adds the given value at each position (tuple(a.T - 1)
makes a tuple with the row indices and the column indices) counting repeated positions correctly.
Upvotes: 1
Reputation: 7896
You can filter np array with the condition then use unique
method to get count
try below solution:
import numpy as np
a = np.array(
[[1, 2], [1, 1], [2, 1], [2, 2], [3, 2], [3, 2], [3, 1], [4, 2], [4, 1]])
b = a[np.any(a == 3, axis=1)]
print(len(b[np.any(b == 2, axis=1)])) #output: 2
print(len(b[np.any(b == 1, axis=1)])) #output: 1
unique, counts = np.unique(b, return_counts=True)
print(dict(zip(unique, counts))) #output: {1: 1, 2: 2, 3: 3}
Short solution:
unique, counts = np.unique(a[np.any(a == 3, axis=1)], return_counts=True) #replace 3 with x
print(dict(zip(unique, counts)))
output:
{1: 1, 2: 2, 3: 3}
Upvotes: 0