Reputation: 11
I have a numpy 2D array of arrays:
samples = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])
I need to count how many times an array is inside of the array occurs above like:
counts = [[1,2,3]:2, [2,3,4]:3, [4,5,6]:1]
I'm not sure how this can get counted or listed out the way I have above to know which array and counts are connected to each other, any help is appreciated. Thank you!
Upvotes: 0
Views: 986
Reputation: 5949
There is a relatively fast method of Python in compare with other Python (no numpy
) solutions:
from collections import Counter
>>> Counter(map(tuple, samples.tolist())) # convert to dict if you need it
Counter({(1, 2, 3): 2, (2, 3, 4): 3, (4, 5, 6): 1})
Python does it quite fast too because operations of tuple indexing are optimised pretty good
import benchit
%matplotlib inline
benchit.setparams(rep=3)
sizes = [3, 10, 30, 100, 300, 900, 3000, 9000, 30000, 90000, 300000, 900000, 3000000]
arr = np.random.randint(0,10, size=(sizes[-1], 3)).astype(int)
def count_python(samples):
return Counter(map(tuple, samples.tolist()))
def count_numpy(samples):
return np.unique(samples, axis=0, return_counts=True)
fns = [count_python, count_numpy]
in_ = {s: (arr[:s],) for s in sizes}
t = benchit.timings(fns, in_, multivar=True, input_name='Number of items')
t.plot(logx=True, figsize=(12, 6), fontsize=14)
Note that arr.tolist()
consumes about 0.8sec/3M of Python computing time.
Upvotes: 0
Reputation: 61
Here's a method of doing without using much of the numpy library:
import numpy as np
samples = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])
result = {}
for row in samples:
inDictionary = False
for check in range(len(result)):
if np.all(result[str(check)][0] == row):
result[str(check)][1]+= 1
inDictionary = True
else:
pass
if inDictionary == False:
result[str(len(result))] = [row, 1]
print("------------------")
print(result)
This method creates a dictionary called result and then loops through the various nested lists in samples and checks if they are already in the dictionary. If they are the count of how many times it has appeared is increased by 1. Otherwise, it creates a new entry for that array.
Now the counts and values that have been saved can be accessed using result["index"]
for the index you want and result["index"][0]
- for the array value & result["index"][1]
- for the number of times it appeared.
Upvotes: 0
Reputation: 31319
Everything you need is directly in numpy
:
import numpy as np
a = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])
print(np.unique(a, axis=0, return_counts=True))
Result:
(array([[1, 2, 3],
[2, 3, 4],
[4, 5, 6]]), array([2, 3, 1], dtype=int64))
The result is a tuple of an array with the unique rows, and an array with the counts of those rows.
If you need to go through them pairwise:
unique_rows, counts = np.unique(a, axis=0, return_counts=True)
for row, c in zip(unique_rows, counts):
print(row, c)
Result:
[1 2 3] 2
[2 3 4] 3
[4 5 6] 1
Upvotes: 2