fastest way to count the number of occurences of a character in a numpy.chararray

Question

Pythonists,

What is the fastest way to count the occurrence of a character in a numpy.character array.

I am doing the following:

In [59]: for i in range(10):
...:     m = input("Enter A or B: ")
...:     rr[0][i] = m
...:     
Enter A or B: B
Enter A or B: B
Enter A or B: B
Enter A or B: A
Enter A or B: B
Enter A or B: A
Enter A or B: A
Enter A or B: A
Enter A or B: B
Enter A or B: A

In [60]: rr
Out[60]: 
chararray([['B', 'B', 'B', 'A', 'B', 'A', 'A', 'A', 'B', 'A']],
          dtype='



I believe there must be a better way to achieve this with speed and elegance.

Divakar · Accepted Answer

It's better to stick to regular NumPy arrays over the chararrays :

Note:

The chararray class exists for backwards compatibility with Numarray, it is not recommended for new development. Starting from numpy 1.4, if one needs arrays of strings, it is recommended to use arrays of dtype object_, string_ or unicode_, and use the free functions in the numpy.char module for fast vectorized string operations.

Going with the regular arrays, let's propose two approaches.

Approach #1

We could use np.count_nonzero to count the True ones after comparison against the search element : 'A' -

np.count_nonzero(rr=='A')

Approach #2

With the chararray holding single character elements only, we could optimize a lot better by viewing into it with uint8 dtype and then comparing and counting. The counting would be much faster, as we would be working with numeric data. The implementation would be -

np.count_nonzero(rr.view(np.uint8)==ord('A'))

On Python 2.x, it would be -

np.count_nonzero(np.array(rr.view(np.uint8))==ord('A'))

Timings

Timings on original sample data and scaled to 10,000x scaled ones -

# Original sample data
In [10]: rr
Out[10]: array(['B', 'B', 'B', 'A', 'B', 'A', 'A', 'A', 'B', 'A'], dtype='

fastest way to count the number of occurences of a character in a numpy.chararray

Answers (2)

Related Questions