Reputation: 5143
I would like to take a dataframe of arrays of breadcrumbs and frequencies to find the cumulative sum per level of the breadcrumb. To clarify; a breadcrumb is a series of parent-child relations within a tree, with each node having an associated frequency. The Tree itself is not uniform:
pandasdf.A[1] = ['a','b','c','d']
pandasdf.A[2] = ['a','b','c']
pandasdf.A[1] = ['x','y','z','q']
pandasdf.A[2] = ['x','l']
pandasdf.B[1] = 12 # corresponding to 'd'
pandasdf.B[2] = 7 # corresponding to 'c'
pandasdf.B[3] = 2 # corresponding to 'q'
pandasdf.B[4] = 9 # corresponding to 'l'
With the breadcrumbs being unique (so we don't have to worry about duplication). I'd like to get a series that corresponds to the cumulative sum of all the parent's children. ie in this case, whichever pandasdf.A == ['a']
will be 19 and pandasdf.A == ['a', 'b']
will be 19 as well.
Upvotes: 2
Views: 993
Reputation: 880249
import pandas as pd
df = pd.DataFrame({
'A': [['a','b','c','d'],['a','b','c'],['x','y','z','q'],['x','l']],
'B': [12,7,2,9]
})
print(df)
# A B
# 0 [a, b, c, d] 12
# 1 [a, b, c] 7
# 2 [x, y, z, q] 2
# 3 [x, l] 9
def cumulative_frequence(df, nodes):
nodes = set(nodes)
mask = df['A'].apply(lambda group: not nodes.isdisjoint(group))
return df.ix[mask, ['B']].sum().item()
print(cumulative_frequence(df, ['a']))
print(cumulative_frequence(df, ['a','b']))
# 19
# 19
Upvotes: 2