conditional cumulative sum for pandas array

Question

I would like to take a dataframe of arrays of breadcrumbs and frequencies to find the cumulative sum per level of the breadcrumb. To clarify; a breadcrumb is a series of parent-child relations within a tree, with each node having an associated frequency. The Tree itself is not uniform:

pandasdf.A[1] = ['a','b','c','d']
pandasdf.A[2] = ['a','b','c']
pandasdf.A[1] = ['x','y','z','q']
pandasdf.A[2] = ['x','l']
pandasdf.B[1] = 12 # corresponding to 'd'
pandasdf.B[2] = 7 # corresponding to 'c'
pandasdf.B[3] = 2 # corresponding to 'q'
pandasdf.B[4] = 9 # corresponding to 'l'

With the breadcrumbs being unique (so we don't have to worry about duplication). I'd like to get a series that corresponds to the cumulative sum of all the parent's children. ie in this case, whichever pandasdf.A == ['a'] will be 19 and pandasdf.A == ['a', 'b'] will be 19 as well.

unutbu · Accepted Answer

import pandas as pd
df = pd.DataFrame({
    'A': [['a','b','c','d'],['a','b','c'],['x','y','z','q'],['x','l']],
    'B': [12,7,2,9]
    })
print(df)

#               A   B
# 0  [a, b, c, d]  12
# 1     [a, b, c]   7
# 2  [x, y, z, q]   2
# 3        [x, l]   9

def cumulative_frequence(df, nodes):
    nodes = set(nodes)
    mask = df['A'].apply(lambda group: not nodes.isdisjoint(group))
    return df.ix[mask, ['B']].sum().item()

print(cumulative_frequence(df, ['a']))
print(cumulative_frequence(df, ['a','b']))
# 19
# 19

conditional cumulative sum for pandas array

Answers (1)

Related Questions