Reputation: 398
I have a numpy array with 2 columns. The second column represents the keys that I want to reduce on.
>>> x
array([[0.1 , 1. ],
[0.25, 1. ],
[0.45, 0. ],
[0.55, 0. ]])
I want to sum up all the values which share a key, like this.
>>>sum_key(x)
array([[0.35 , 1. ],
[1.0, 0. ]])
This seems like a relatively universal task, but I can't find a good name for it or see it discussed. Any ideas?
Upvotes: 2
Views: 398
Reputation: 15525
A solution without numpy.
Grouping elements by key is typically done with a python dict.
Be careful if your keys are floating-points. For instance, 1.000000001 and 1.0 will be distinct keys. I suggest rounding to int first.
x = [[0.1 , 1 ],
[0.25, 1 ],
[0.45, 0 ],
[0.55, 0 ]]
y = {}
for v, k in x:
y[k] = y.get(k, 0) + v
print(y)
{1: 0.35, 0: 1.0}
You can get an array again from dict y
if you want:
z = np.array([(v,k) for k,v in y.items()])
print(z)
# [[0.35 1. ]
# [1. 0. ]]
Upvotes: 1
Reputation: 7161
If the indices (keys) are ascending integers (or can be casted easily as in your case) the most convenient way is to use np.bincount.
import numpy as np
x = np.array([[0.1 , 1. ],
[0.25, 1. ],
[0.45, 0. ],
[0.55, 0. ]])
v = x[:, 0]
i = x[:, 1]
counts = np.bincount(i.astype(int), v)
print(counts)
# returns [1. 0.35]
Upvotes: 1
Reputation: 1789
import numpy as np
import pandas as pd
data = np.array([[0.1 , 1. ],
[0.25, 1. ],
[0.45, 0. ],
[0.55, 0. ]])
df = pd.DataFrame(data)
gr = df.groupby([1])[0].agg('sum')
print(gr.keys().values)
data1 = np.array([[gr[k],k] for k in gr.keys().values])
print(data1)
Upvotes: 0
Reputation: 2897
This is kinda overcomplicated but it should do the work:
import numpy as np
x = np.array([[0.1 , 1. ],
[0.25, 1. ],
[0.45, 0. ],
[0.55, 0. ]])
keys = x[:,1]
values = x[:,0]
keys_unique = np.unique(keys)
print([[sum(values[keys == k]), k] for k in keys_unique])
Output:
[[1.0, 0.0], [0.35, 1.0]]
Upvotes: 1