Reputation: 115
This codes look right to me, but it's taking me an awful time to run.
In both cases, I'm dealing with 78k rows of data. I've managed to reduce the columns to 2-4 to simplify the code. Here, I'm leaving the first and second columns alone and trying to substitute the third column with the derivative. I would do it monthly until the ProjID changes, telling the code to
for j in range (1,len(joined)):
if joined['ProjID'][j] == joined['ProjID'][j-1]:
joined.loc[j]=[joined.ProjID[j], joined.Month[j], (askingrent[j]-askingrent[j-1])/askingrent[j-1]]
else:
joined.loc[j]=[joined.ProjID[j], joined.Month[j], 0]
Here, I have 78k of rows again. But, I'm trying to simply convert the column into datetime and delete the time (hours and minutes). The code looks simple enough; but I've been waiting for 30minutes~ish. Is the speed relevant to code or something else?
ageofbuildings['Month']=pd.to_datetime(ageofbuildings['Month'])
for i in range (0, len(ageofbuildings)):
ageofbuildings.Month[i]=ageofbuildings.Month[i].date()
Upvotes: 0
Views: 75
Reputation: 8906
You need to use vectorized operations to speed your code up. Firstly, it looks like the 'Month'
and 'ProjID'
columns are assigned to themselves whether the if clause is True or not so this is redundant.
The vector equivalent to the if clause would be
same_as_prev = joined['ProjID'] == joined['ProjID'].shift()
To get the vector values to sub in you can do
values = askingrent / askingrent.shift() - 1
assuming askingrent
is a pandas Series. Then you can do
joined.iloc[:, 2] = np.where(same_as_prev, values, 0)
I think the second code block can be achieved by
ageofbuildings['Month'] = pd.DatetimeIndex(ageofbuildings['Month']).date
but it's hard to know as you haven't given a look at your input/output.
Upvotes: 3