Bryan
Bryan

Reputation: 115

Improving the speed/optimizing this code

This codes look right to me, but it's taking me an awful time to run.

In both cases, I'm dealing with 78k rows of data. I've managed to reduce the columns to 2-4 to simplify the code. Here, I'm leaving the first and second columns alone and trying to substitute the third column with the derivative. I would do it monthly until the ProjID changes, telling the code to

for j in range (1,len(joined)):
    if joined['ProjID'][j] == joined['ProjID'][j-1]:
        joined.loc[j]=[joined.ProjID[j], joined.Month[j], (askingrent[j]-askingrent[j-1])/askingrent[j-1]]
    else:
        joined.loc[j]=[joined.ProjID[j], joined.Month[j], 0]

Here, I have 78k of rows again. But, I'm trying to simply convert the column into datetime and delete the time (hours and minutes). The code looks simple enough; but I've been waiting for 30minutes~ish. Is the speed relevant to code or something else?

ageofbuildings['Month']=pd.to_datetime(ageofbuildings['Month'])
for i in range (0, len(ageofbuildings)):
    ageofbuildings.Month[i]=ageofbuildings.Month[i].date()

Upvotes: 0

Views: 75

Answers (1)

JoeCondron
JoeCondron

Reputation: 8906

You need to use vectorized operations to speed your code up. Firstly, it looks like the 'Month' and 'ProjID' columns are assigned to themselves whether the if clause is True or not so this is redundant. The vector equivalent to the if clause would be

same_as_prev = joined['ProjID'] == joined['ProjID'].shift()

To get the vector values to sub in you can do

values = askingrent / askingrent.shift() - 1

assuming askingrent is a pandas Series. Then you can do

joined.iloc[:, 2] = np.where(same_as_prev, values, 0)

I think the second code block can be achieved by

ageofbuildings['Month'] = pd.DatetimeIndex(ageofbuildings['Month']).date

but it's hard to know as you haven't given a look at your input/output.

Upvotes: 3

Related Questions