Mr.Weathers
Mr.Weathers

Reputation: 398

How can I reduce a numpy array based on a key rather than an axis?

I have a numpy array with 2 columns. The second column represents the keys that I want to reduce on.

>>> x
array([[0.1 , 1.  ],
       [0.25, 1.  ],
       [0.45, 0.  ],
       [0.55, 0.  ]])

I want to sum up all the values which share a key, like this.

>>>sum_key(x)
array([[0.35 , 1.  ],
       [1.0, 0.  ]])

This seems like a relatively universal task, but I can't find a good name for it or see it discussed. Any ideas?

Upvotes: 2

Views: 398

Answers (4)

Stef
Stef

Reputation: 15525

A solution without numpy.

Grouping elements by key is typically done with a python dict.

Be careful if your keys are floating-points. For instance, 1.000000001 and 1.0 will be distinct keys. I suggest rounding to int first.

Using a dict

x = [[0.1 , 1  ],
     [0.25, 1  ],
     [0.45, 0  ],
     [0.55, 0  ]]

y = {}
for v, k in x:
    y[k] = y.get(k, 0) + v

print(y)
{1: 0.35, 0: 1.0}

You can get an array again from dict y if you want:

z = np.array([(v,k) for k,v in y.items()])

print(z)
# [[0.35 1.  ]
#  [1.   0.  ]]

Upvotes: 1

Joe
Joe

Reputation: 7161

If the indices (keys) are ascending integers (or can be casted easily as in your case) the most convenient way is to use np.bincount.

import numpy as np

x = np.array([[0.1 , 1.  ],
             [0.25, 1.  ],
             [0.45, 0.  ],
             [0.55, 0.  ]])

v = x[:, 0]
i = x[:, 1]

counts = np.bincount(i.astype(int), v)

print(counts)

# returns [1.   0.35]

Upvotes: 1

Anatoliy R
Anatoliy R

Reputation: 1789

import numpy as np
import pandas as pd

data = np.array([[0.1 , 1.  ],
       [0.25, 1.  ],
       [0.45, 0.  ],
       [0.55, 0.  ]])

df = pd.DataFrame(data)

gr = df.groupby([1])[0].agg('sum')

print(gr.keys().values)

data1 = np.array([[gr[k],k] for k in gr.keys().values])
print(data1)

Upvotes: 0

ExplodingGayFish
ExplodingGayFish

Reputation: 2897

This is kinda overcomplicated but it should do the work:

import numpy as np
x = np.array([[0.1 , 1.  ],
       [0.25, 1.  ],
       [0.45, 0.  ],
       [0.55, 0.  ]])
keys = x[:,1]
values = x[:,0]
keys_unique = np.unique(keys)
print([[sum(values[keys == k]), k] for k in keys_unique])

Output:

[[1.0, 0.0], [0.35, 1.0]]

Upvotes: 1

Related Questions