Reputation: 1001

cumsum() over several columns

I want to create the following dataframe:

Index       A        B        C
 a          0        0        0
 b          1       10        0
 c         -1        0      -10
 d          1       20        0
 e          0        0        0
 f         -1        0      -20
 g          0        0        0
 h          1       15        0
 i         -1        0      -15

A and B are given. C shall be generated via a function and without iteration. Possibly in this fashion:

def generate_C():
    C = np.where(A == -1, << here prior value from B * -1 >>, 0)
    df['C] = C
    return df

I tried via cumsum() over columns B and C but did not come far. Any suggestions please?

Upvotes: 3

Answers (3)

jezrael

Reputation: 863751

You can use:

df.loc[df.A==-1, 'C'] = (-df.loc[df.A==1, 'B']).values
df.C.fillna(0, inplace=True)
print (df)
       A   B    C   
Index                 
a      0   0   0.0
b      1  10   0.0
c     -1   0 -10.0
d      1  20   0.0
e      0   0   0.0
f     -1   0 -20.0

Upvotes: 5

Jan Christoph Terasa

Reputation: 5945

It's easy to do in numpy, but I have yet to find a way to find a way to do it directly in pandas, because apparently pandas somehow ignores the fancy indexing:

def generate_C(df, inplace=False):
    import numpy

    if not inplace:
        df = df.copy()

    A, B = df.values.T
    C = numpy.zeros_like(A)
    C[A==-1] = -B[A==1]
    df['C'] = C

    return df

EDIT:

I found a way to to it with pure pandas:

def generate_C(df, inplace=False):
    if not inplace:
        df = df.copy()

    df['C'] = (-df.B[df.A==1]).reindex(df.A[df.A==-1].index,method='pad')
    df['C'].fillna(0, inplace=True)

    return df

Upvotes: 2

piRSquared

Reputation: 294556

Try:

df.C = (df.B.replace(0, np.nan).ffill().shift() * (df.A == -1) * -1).fillna(0)

confirmed jezrael's suggestion:

df.C = (df.B.replace(0, np.nan).ffill() * (df.A == -1) * -1).fillna(0)

works just as well. Since we are forward filling, the shift became unnecessary.

confirmed ColonelBeauvel's suggestion:

df.C = np.where(df.A==-1, -df.B.replace(0, method='ffill').shift(), 0)

Also works and would be my vote as the answer of choice had it been posted as an answer and actually my choice.

Upvotes: 3

cumsum() over several columns

Answers (3)

Related Questions