tasha
tasha

Reputation: 25

How to sum the information at two consecutive positions in a dataframe

I have a pandas dataframe with position,k, y. For example

pos k y
123 0.7 0.5
124 0.4 0.1
125 0.3 0.2
126 0.4 0.1
128 0.3 0.6
130 0.4 0.9
131 0.3 0.2

i would like to sum the information at k and y like

123 1.1 0.6
125 0.7 0.3
128 0.3 0.6
130 0.7 1.1

so the output has only the first positions and the sum of value the first and its immediate consecutive number which follows it.

I tried grouping by pandas

for k,g in df.groupby(df['pos'] - np.arange(df.shape[0])):
   u=g.ix[0:,2:].sum()

but its groups all the consecutive numbers which I dont want

ALSO I NEED SOMETHING FAST AS I HAVE 2611774 ROW IN MY DATAFILE

Upvotes: 1

Views: 1796

Answers (3)

Maybe this is faster than a loop, but it won't sum positions 123 and 124 and then 130 and 131 as I think you expect, because it sums odd positions with its consecutive like 129 and 130, 131 and 132...

df = df.set_index('pos')
df_odd = df.loc[df.index.values % 2 == 1]
df_even = df.loc[df.index.values % 2 == 0]
df_even = df_even.set_index(df_even.index.values - 1)
df_odd.add(df_even, fill_value = 0)

Result:

pos k   y
123 1.1 0.6
125 0.7 0.3
127 0.3 0.6
129 0.4 0.9
131 0.3 0.2

Upvotes: 1

Pomadomaphin
Pomadomaphin

Reputation: 116

I have not used pandas before, but if you get the chance to use the data as a list then this should work.

def SumNext(L):
  N = xrange(len(L)-1)
  Output = [L[i]+L[i+1] for i in N]
  return Output

This function will give you a summation of consecutive elements if you input a list.

A=[1,1,2,3,5,8,13] SumNext(A) => [2,3,5,8,13]

then you just have to read out the values to wherever you like, it is much faster to do things in lists (as opposed to while loops) when you get lots of elements.

then you will just need to figure out the implementation of passing the output back to your data frame.

Upvotes: 0

Ankit Kumar Namdeo
Ankit Kumar Namdeo

Reputation: 1464

Hope this will solve your problem

import pandas as pd
df = pd.DataFrame( columns=['pos','k','y'])
cf = pd.DataFrame( columns=['pos','k','y'])
df['pos']=123, 124,125,126,128,130,131
df['k']=.7,.4,.3,.4,.3,.4,.3
df['y']=.5,.1,.2,.1,.6,.9,.2
row=0
while 1:
    if row+1<len(df):
        if(df.loc[row]['pos']+1==df.loc[row+1]['pos']):
            cf.loc[row]= df.loc[row]+df.loc[row+1]
            cf.loc[row]['pos']=df.loc[row]['pos']
            row=row+2
        else:
            cf.loc[row]= df.loc[row]
            row=row+1
    else:
        break
print cf

Upvotes: 1

Related Questions