Reputation: 25
I have a pandas dataframe with position,k, y. For example
pos k y
123 0.7 0.5
124 0.4 0.1
125 0.3 0.2
126 0.4 0.1
128 0.3 0.6
130 0.4 0.9
131 0.3 0.2
i would like to sum the information at k and y like
123 1.1 0.6
125 0.7 0.3
128 0.3 0.6
130 0.7 1.1
so the output has only the first positions and the sum of value the first and its immediate consecutive number which follows it.
I tried grouping by pandas
for k,g in df.groupby(df['pos'] - np.arange(df.shape[0])):
u=g.ix[0:,2:].sum()
but its groups all the consecutive numbers which I dont want
ALSO I NEED SOMETHING FAST AS I HAVE 2611774 ROW IN MY DATAFILE
Upvotes: 1
Views: 1796
Reputation: 413
Maybe this is faster than a loop, but it won't sum positions 123 and 124 and then 130 and 131 as I think you expect, because it sums odd positions with its consecutive like 129 and 130, 131 and 132...
df = df.set_index('pos')
df_odd = df.loc[df.index.values % 2 == 1]
df_even = df.loc[df.index.values % 2 == 0]
df_even = df_even.set_index(df_even.index.values - 1)
df_odd.add(df_even, fill_value = 0)
Result:
pos k y
123 1.1 0.6
125 0.7 0.3
127 0.3 0.6
129 0.4 0.9
131 0.3 0.2
Upvotes: 1
Reputation: 116
I have not used pandas before, but if you get the chance to use the data as a list then this should work.
def SumNext(L):
N = xrange(len(L)-1)
Output = [L[i]+L[i+1] for i in N]
return Output
This function will give you a summation of consecutive elements if you input a list.
A=[1,1,2,3,5,8,13] SumNext(A) => [2,3,5,8,13]
then you just have to read out the values to wherever you like, it is much faster to do things in lists (as opposed to while loops) when you get lots of elements.
then you will just need to figure out the implementation of passing the output back to your data frame.
Upvotes: 0
Reputation: 1464
Hope this will solve your problem
import pandas as pd
df = pd.DataFrame( columns=['pos','k','y'])
cf = pd.DataFrame( columns=['pos','k','y'])
df['pos']=123, 124,125,126,128,130,131
df['k']=.7,.4,.3,.4,.3,.4,.3
df['y']=.5,.1,.2,.1,.6,.9,.2
row=0
while 1:
if row+1<len(df):
if(df.loc[row]['pos']+1==df.loc[row+1]['pos']):
cf.loc[row]= df.loc[row]+df.loc[row+1]
cf.loc[row]['pos']=df.loc[row]['pos']
row=row+2
else:
cf.loc[row]= df.loc[row]
row=row+1
else:
break
print cf
Upvotes: 1