Furqan Hashim
Furqan Hashim

Reputation: 1318

Fill Zero values in Pandas column based on last non-zero value if a criteria is fulfilled

Consider a Pandas DataFrame test = pd.DataFrame(data = [0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 1, 0, 0], columns = ['holding'])

Output:

+----------+
| Holdings |
+----------+
|        0 |
|        0 |
|        1 |
|        0 |
|        0 |
|        0 |
|       -1 |
|        0 |
|        0 |
|        0 |
|        1 |
|        0 |
|        0 |
+----------+

I want to replace all the zero values with the last non-zero value if last non-zero value is equal to 1. If the last non-zero value is equal to -1, then there is no need to replace 0 with 1.

I've tried test['position_holding'] = test['holding'].replace(to_replace=0, method='ffill') which resulted in

+------------------+
| position_holding |
+------------------+
|                0 |
|                0 |
|                1 |
|                1 |
|                1 |
|                1 |
|               -1 |
|               -1 |
|               -1 |
|               -1 |
|                1 |
|                1 |
|                1 |
+------------------+

The only thing I need to fix in the above table is zero filled with -1 which violates the 2nd condition. How can I achieve that?

Desired Output:
+------------------+
| position_holding |
+------------------+
|                0 |
|                0 |
|                1 |
|                1 |
|                1 |
|                1 |
|               -1 |
|                0 |
|                0 |
|                0 |
|                1 |
|                1 |
|                1 |
+------------------+

Upvotes: 1

Views: 1592

Answers (3)

xzelda
xzelda

Reputation: 182

This should work

test = pd.DataFrame(data = [0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 1, 0, 0], 
                    columns = ['holding'])
test['position_holding'] = test['holding'].replace(to_replace=0, method='ffill')

test["Diff"] = test["holding"]-test["position_holding"]
test.loc[test["Diff"]==1, 'position_holding']=0

Then you can drop the Diff column which is now useless.

Upvotes: 0

Uddeshya Singh
Uddeshya Singh

Reputation: 59

Not using pandas or numpy but a simple for loop works too.

for i in range(1, len(test)):
    if(test['holding'][i] == 0 and test['holding'][i-1] == 1):
        test['holding'][i] = 1

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150785

My approach:

after = test.holding.eq(1)
before = test.holding.eq(-1)

test['pos_holding'] = test.holding.mask(test.holding.where(after|before).ffill()==1,1)

Equivalent code, a bit shorter:

mask = test.holding.where(test.holding != 0).ffill()
test['pos_holding'] = test.holding.mask(mask==1, 1)

Output:

    holding  pos_holding
0         0            0
1         0            0
2         1            1
3         0            1
4         0            1
5         0            1
6        -1           -1
7         0            0
8         0            0
9         0            0
10        1            1
11        0            1
12        0            1

Upvotes: 2

Related Questions