Reputation: 383
Assume there is a numpy contains the following data structure:
import numpy as np
a = np.array([['2','W','A'],
['3', 'R', 'A'],
['4', 'W', 'R'],
['2', 'E', 'R'],
['4', 'E', 'Y'],
['3', 'E', 'Y']])
[['A' '2']
['R' '2']
['Y' '2']]
(For example the value of A
appears in the third column twice, so the result will be 'A' '2'
.)
[['W' 'A' '1']
['R' 'A' '1']
['W' 'R' '1']
['E' 'R' '1']
['E' 'Y' '2']
For example the value of E
in the second column together with the value of Y
in the third column appears twice, so the result will be 'E' 'Y' '2'
.
Upvotes: 1
Views: 710
Reputation: 11181
(setup code)
import pandas as pd
import numpy as np
a = np.array(
[['2', 'W', 'A'],
['3', 'R', 'A'],
['4', 'W', 'R'],
['2', 'E', 'R'],
['4', 'E', 'Y'],
['3', 'E', 'Y']],
)
Pandas is well suited for this and leverages numpy on the backend. For example, you can get the second, third column counts like this:
df = pd.DataFrame(a)
cols = [1,2]
df[cols].value_counts().astype("str").reset_index().values
result:
array([['E', 'Y', '2'],
['W', 'R', '1'],
['W', 'A', '1'],
['R', 'A', '1'],
['E', 'R', '1']], dtype=object)
Upvotes: 1
Reputation: 28074
Use numpy's count occurances, then reformat
import numpy as np
a = np.array([['2','W','A'],
['3', 'R', 'A'],
['4', 'W', 'R'],
['2', 'E', 'R'],
['4', 'E', 'Y'],
['3', 'E', 'Y']])
unique, counts = np.unique(a[:, 2], return_counts=True)
result = np.vstack([unique, counts]).T
print(result)
As for the second qurstion:
If you want to avoid for loops or list comprehensions, and stick to plain numpy, and are willing to give up your exact formatting for the output, you can do
ind_col = np.core.defchararray.add(a[:, 1], a[:, 2])
unique, counts = np.unique(ind_col, return_counts=True)
result1 = np.vstack([unique, counts]).T
print(result1)
[['ER' '1'] ['EY' '2'] ['RA' '1'] ['WA' '1'] ['WR' '1']]
Upvotes: 1
Reputation: 46978
For the first, you can use np.unique()
, and specify axis = 0
to tabulate the rows:
def tabulate_array(x,columns):
idx,counts = np.unique(x[:,columns],return_counts=True,axis=0)
return [list(idx[i]) + list(str(counts[i])) for i in range(len(counts))]
The last part to concatenate the counts and list might be refined a bit more but for now it will give you the string output, for example:
tabulate_array(a,[2])
[['A', '2'], ['R', '2'], ['Y', '2']]
tabulate_array(a,[1,2])
[['E', 'R', '1'],
['E', 'Y', '2'],
['R', 'A', '1'],
['W', 'A', '1'],
['W', 'R', '1']]
Upvotes: 1