Reputation: 112
I'm a bit new to Python and Pandas and I'm trying to do a very simple thing, which is looping over a column of a pandas dataframe called df and modify a value, like in the following snippet
for i in range(0,len(df.time) - 1):
if df.time[i] == df.time[i+1]:
df.at[i,'time'] = df.time[i] - 1
df is the dataframe, which has the column "time". I'm looking for repetitive ticks on time, and if two subsequent timesteps have the same value, I decrement the first by 1.
The problem is that it takes too much time! I ran it for over 20 minutes and it didn't end! On Matlab, the same thing runs in seconds. Why is that and how can I fix? I should also say that the size of this dataframe is over 9 million.
Thanks in advance.
Upvotes: 0
Views: 100
Reputation: 210852
Is that what you want?
In [83]: df['new'] = df['time']
In [84]: df.loc[df.time.diff(-1).eq(0), 'new'] = df.loc[df.time.diff(-1).eq(0), 'time'] - 1
In [85]: df
Out[85]:
time new
0 1 1
1 2 2
2 4 3
3 4 4
4 5 5
5 7 6
6 7 7
7 8 8
Upvotes: 1