Reputation: 1051

Stop apply when condition is met

I have a pandas DataFrame with 1,000 columns and 30 million sample rows. I need to perform some operations(lets say addition,multiplication etc.,) on each column. If some value in any column after operation changes to 0, then I need to stop applying operations further on remaining columns and rows. Also, I would like to know at which column and row it changed to 0.

I have used iterrows with a few checks, but there is a performance issue as there is lots of data. Also, is there any alternatives to apply, iterrows?

ID   PID     PC   TID
10   1005   8017  3
11   10335  5019  2
12   1000   8017  1
13   243    8870  1
14   4918   8305  3
15   9017   8305  3

Apply operations column-wise:

Col1 subtract by 9.
Col2 subtract by 1000.
Col3 divide by 100.
Col4 subtract by 1.

After doing apply on second column, 3rd value is 0 and then whole process should be stopped and return the 2nd column 3rd row.

Output: If Column wise operations are performed:

ID   PID    PC     TID
1    5      8017   3
2    9335   5019   2
3    0      8017   1
4    243    8870   1
5    4918   8305   3
6    9017   8305   3

If row wise operations are performed :

ID   PID    PC      TID
1    5      80.17   2
2    9335   50.19   1
3    0      8017    1
13   243    8870    1
14   4918   8305    3
15   9017   8305    3

Upvotes: 3

Answers (2)

BENY

Reputation: 323326

This is my solution as I mention in the comment

df1=df.copy()
df['PID']-=1000;df['PC']/=9;df['TID']-=1;df['ID']-=9

s=df.eq(0).idxmax(axis=0)
s
Out[492]:
ID     0
PID    2
PC     0
TID    2
dtype: int64

for x ,i in s.iteritems():
    df.loc[i:,x]=df1.loc[i:,x]

Upvotes: 1

Ami Tavory

Reputation: 76346

Considering how you have many more rows than columns, and that vectorized ops are so much faster, I'd suggest the following:

for c in df.columns:
    res = <apply function on df[c]>
    if (res != 0).all(): # No zero found
        df[c] = res
        continue
    # Zero found - apply only up to it.
    df[c] = res[(res != 0).astype(int).cumsum() == 0] # Apply up to first 0
    break

Upvotes: 1

Stop apply when condition is met

Answers (2)

Related Questions