harshcode
harshcode

Reputation: 1

Speeding up runtime for code with multiple if statements

I'm working on dataset that has a column that tells you how many days it took for a building permit to be approved (time_range column gives you this info). I'm trying to create another column (time_frame) that will break those approval times into categories like 1-29 days, 30 - 59 days, etc. The dataset also has permits that were denied and I already filled some of the time_frame column with denied. For the rest of the entries, I'm trying to fill it in with a category that I've created. When I run the cell in jupyter notebook, it is stuck running and hasn't outputted anything. How should I rewrite my code to use less if-else statements and possibly remove the for-loop?

Here is my code:

for i in range(0,len(df['time_range'])):

    if df.loc[i,'time_frame'] != 'denied':

    if df.loc[i,'time_range'] == 0.0:
        df.loc[i,'time_frame'] = 'instant approval'

    elif (df.loc[i,'time_range'] >= 1.0 and df.loc[i,'time_range'] <= 29.0):
        df.loc[i,'time_frame'] = '1 - 29 days'

    elif (df.loc[i,'time_range'] >= 30.0 and df.loc[i,'time_range'] <= 59.0):
        df.loc[i,'time_frame'] = '30 - 59 days'

    elif (df.loc[i,'time_range'] >= 60.0 and df.loc[i,'time_range'] <= 89.0):
        df.loc[i,'time_frame'] = '60 - 89 days'

    elif (df.loc[i,'time_range'] >= 90.0 and df.loc[i,'time_range'] <= 119.0):
        df.loc[i,'time_frame'] = '90 - 119 days'

    elif (df.loc[i,'time_range'] >= 120.0 and df.loc[i,'time_range'] <= 149.0):
        df.loc[i,'time_frame'] = '120 - 150 days'

    elif (df.loc[i,'time_range'] >= 150.0 and df.loc[i,'time_range'] <= 179.0):
        df.loc[i,'time_frame'] = '150 - 179 days'

    else:
        df.loc[i,'time_frame'] = '180+ days'

Upvotes: 0

Views: 35

Answers (1)

cs95
cs95

Reputation: 402263

Setup

df = pd.DataFrame({
        'time_frame': {0: nan, 1: nan, 2: nan, 3: 'denied', 4: nan, 5: nan, 6: nan},
        'time_range': {0: 0, 1: 10, 2: 120, 3: 10, 4: 50, 5: 175, 6: 250}})

df
   time_range time_frame
0           0        NaN
1          10        NaN
2         120        NaN
3          10     denied
4          50        NaN
5         175        NaN
6         250        NaN

Use pd.cut and mask those rows where "time_frame" is "denied":

bins = [-np.inf, 0, 29, 59, 89, 119, 149, 179, np.inf]
labels = [
    'instant', '1-29 days', '30-59 days', '60-89 days', 
    '90-119 days', '120-149 days', '150-179 days', '180+ days']

df['time_frame'] = (
    pd.cut(df['time_range'], bins=bins, labels=labels, right=True)
      .where(df['time_frame'].ne('denied'), 'denied'))

print(df)
   time_range    time_frame
0           0       instant
1          10     1-29 days
2         120  120-149 days
3          10        denied
4          50    30-59 days
5         175  150-179 days
6         250     180+ days

Upvotes: 2

Related Questions