Reputation: 630
Edit: I have altered the sample data so that the 5th row is now gone due to an error in the sample.
Assume that we a directed graph G = (V, E)
of edges E
and vertices V
. Assume that we have a Pandas DataFrame describing which nodes (u, v)
are connected to each other and the value/weight of the corresponding edge e
. Let the following be a representation of such a DataFrame.
# from to weight
-----------------------
0 0 1 1.0
1 1 2 0.5
2 2 3 0.2
3 0 4 1.3
4 4 5 0.9
Is it possible to somehow add a column with accumulated weights, such that for instance row 2 has an accumulated weight value of 1.7=0.2+0.5+1.0
since we have a path 0->1->2->3
? Preferably in a vectorized so that the calculation scales. In other words, we should get the following DataFrame.
# from to weight accumulated
-------------------------------------
0 0 1 1.0 1.0
1 1 2 0.5 1.5
2 2 3 0.2 1.7
3 0 4 1.3 1.3
4 4 5 0.9 2.2
We can assume that there is no other path to vertex 3
since the DataFrame is made such that only shortest paths are included.
I have thus far written the following piece of code that uses DataFrame.apply
, which is not a vectorized approach. Here I store / cache previously calculated accumulated values in a dictionary called accum_map
.
def __set_accum(self, row):
search = row["to"]
if search in self.accum_map:
return self.accum_map[search]
from_node = row["from"]
old_from = self.df[self.df["to"] == from_node].get("from")
old_from = None if old_from.empty else old_from.values[0]
weight = row["weight"]
self.accum_map[search] = self.__set_accum({"to": from, "from": old_from}) + weight
return self.accum_mapp[search]
def set_accumulated(self):
self.df["accumulated"] = self.df.apply(func=self.__set_timestamp, axis=1)
Upvotes: 0
Views: 108
Reputation: 1083
There is some issue with your dataset. I re-create another example here:
import pandas as pd
import numpy as np
df = pd.DataFrame({'from':[0,2,3,1,0],
'to':[1,2,4,4,2],
'val':[10,20,30,40,50]})
Here if you do not have to groupby, then it is easier to exact series
outside, perform indexing sum then add back to main dataframe.
# Extract value:
s = df['val']
st = df['from']
gt = (df['to']+1)
# Perform cumsum by
out = list()
for i,j in zip(st,gt):
out.append(s[i:j].sum())
# Add `new` col from result
df['new'] = out
Upvotes: 0