Francesco Di Lauro
Francesco Di Lauro

Reputation: 630

pandas product of a column with its index groupby

I am working with a dataframe, and had to do a groupby in order to make some operations on my data.

This is an example of my Dataframe:

 I  SI   deltas

 1  10   0.1
 1  14   0.1
 2  10   0.1
 2  18   0.3
 1  17   0.05
 2  30   0.3
 1  10   0.4
 1  14   0.2
 2  10   0.1
 2  18   0.2
 1  17   0.15

Now, for each I, I count the relative frequency of the SI in this way:

results = df.groupby(['I', 'SI'])[['deltas']].sum()
#for each I, we sum all the weights (Deltas)
denom = results.groupby('I')['deltas'].sum()
#for each I, we divide each  deltas by the sum, getting them normalized to one
results.deltas = results.deltas / denom

So my Dataframe now looks like this:

I = 1

             deltas 

  SI = 10     0.5
  SI = 14     0.3
  SI = 17     0.2

I = 2

             deltas 

  SI = 10     0.2
  SI = 18     0.5
  SI = 30     0.3

....

What I need to do is to print for each I the sum of deltas times their relative SI:

     I = 1       sum =    0.5 * 10 + 0.3*14 + 0.2*17 = 12.6 
     I = 2       sum =    0.2*10 + 18*0.5 + 30*0.3 = 21

But since now I am working with a dataframe where the indices are I and SI, I do not know how to use them. I tried this code:

     for idx2, j in enumerate(results.index.get_level_values(0).unique()):
        #print results.loc[j]
        f.write("%d\t"%(j)+results.loc[j].to_string(index=False)+'\n')

but I am not sure how should I proceed to get the indices values

Upvotes: 1

Views: 316

Answers (1)

jpp
jpp

Reputation: 164773

Let's assume you have an input dataframe df following your initial transformations. If SI is your index, elevate it to a column via df = df.reset_index() as an initial step.

   I  SI  weight
0  1  10     0.5
1  1  14     0.3
2  1  17     0.2
3  2  10     0.2
4  2  18     0.5
5  2  30     0.3

You can then calculate the product of SI and weight, then use GroupBy + sum:

res = df.assign(prod=df['SI']*df['weight'])\
        .groupby('I')['prod'].sum().reset_index()

print(res)

   I  prod
0  1  12.6
1  2  20.0

For a single dataframe in isolation, you can use np.dot for the dot product.

s = pd.Series([0.5, 0.3, 0.2], index=[10, 14, 17])
s.index.name = 'SI'

res = np.dot(s.index, s)  # 12.6

Upvotes: 1

Related Questions