Very long execution during loop over a pandas dataframe

Question

I'm a bit new to Python and Pandas and I'm trying to do a very simple thing, which is looping over a column of a pandas dataframe called df and modify a value, like in the following snippet

for i in range(0,len(df.time) - 1):
    if df.time[i] == df.time[i+1]:
       df.at[i,'time'] = df.time[i] - 1

df is the dataframe, which has the column "time". I'm looking for repetitive ticks on time, and if two subsequent timesteps have the same value, I decrement the first by 1.

The problem is that it takes too much time! I ran it for over 20 minutes and it didn't end! On Matlab, the same thing runs in seconds. Why is that and how can I fix? I should also say that the size of this dataframe is over 9 million.

Thanks in advance.

MaxU - stand with Ukraine · Accepted Answer

Is that what you want?

In [83]: df['new'] = df['time']

In [84]: df.loc[df.time.diff(-1).eq(0), 'new'] = df.loc[df.time.diff(-1).eq(0), 'time'] - 1

In [85]: df
Out[85]:
   time  new
0     1    1
1     2    2
2     4    3
3     4    4
4     5    5
5     7    6
6     7    7
7     8    8

Very long execution during loop over a pandas dataframe

Answers (1)

Related Questions