Jack Arnestad
Jack Arnestad

Reputation: 1865

Convert numpy array with values into array with frequency for each observation in each row

I have a numpy array as follows:

array = np.random.randint(6, size=(50, 400))

This array has the cluster that each value belongs to, with each row representing a sample and each column representing a feature, but I would like to create a 5 dimensional array with the frequency of each cluster (in each sample, represented as a row in this matrix).

However, in the frequency calculation, I want to ignore 0, meaning that the frequency of all values except 0 (1-5) should add to 1.

Essentially what I want is a array with each row being a cluster (1-5) in this case, and each row still contains a single sample.

How can this be done?

Edit:

small input:

input = np.random.randint(6, size=(2, 5))

array([[0, 4, 2, 3, 0],
       [5, 5, 2, 5, 3]])

output:

1    2    3    4    5

0   .33  .33  .33   0
0   .2   .2    0   .6    

Where 1-5 are the row names, and the bottom two rows are the desired output in a numpy array.

Upvotes: 1

Views: 107

Answers (1)

chthonicdaemon
chthonicdaemon

Reputation: 19770

This is a simple application of bincount. Does this do what you want?

def freqs(x):
    counts = np.bincount(x, minlength=6)[1:]
    return counts/counts.sum()

frequencies = np.apply_along_axis(freqs, axis=1, arr=array)

If you were wondering about the speed implications of apply_along_axis, this method using tricky indexing is marginally slower in my tests:

counts = (array[:, :, None] == values[None, None, :]).sum(axis=1)
frequencies2 = counts/counts.sum(axis=1)[:, None]

Upvotes: 4

Related Questions