Reputation: 1
I'm working on dataset that has a column that tells you how many days it took for a building permit to be approved (time_range column gives you this info). I'm trying to create another column (time_frame) that will break those approval times into categories like 1-29 days, 30 - 59 days, etc. The dataset also has permits that were denied and I already filled some of the time_frame column with denied. For the rest of the entries, I'm trying to fill it in with a category that I've created. When I run the cell in jupyter notebook, it is stuck running and hasn't outputted anything. How should I rewrite my code to use less if-else statements and possibly remove the for-loop?
Here is my code:
for i in range(0,len(df['time_range'])):
if df.loc[i,'time_frame'] != 'denied':
if df.loc[i,'time_range'] == 0.0:
df.loc[i,'time_frame'] = 'instant approval'
elif (df.loc[i,'time_range'] >= 1.0 and df.loc[i,'time_range'] <= 29.0):
df.loc[i,'time_frame'] = '1 - 29 days'
elif (df.loc[i,'time_range'] >= 30.0 and df.loc[i,'time_range'] <= 59.0):
df.loc[i,'time_frame'] = '30 - 59 days'
elif (df.loc[i,'time_range'] >= 60.0 and df.loc[i,'time_range'] <= 89.0):
df.loc[i,'time_frame'] = '60 - 89 days'
elif (df.loc[i,'time_range'] >= 90.0 and df.loc[i,'time_range'] <= 119.0):
df.loc[i,'time_frame'] = '90 - 119 days'
elif (df.loc[i,'time_range'] >= 120.0 and df.loc[i,'time_range'] <= 149.0):
df.loc[i,'time_frame'] = '120 - 150 days'
elif (df.loc[i,'time_range'] >= 150.0 and df.loc[i,'time_range'] <= 179.0):
df.loc[i,'time_frame'] = '150 - 179 days'
else:
df.loc[i,'time_frame'] = '180+ days'
Upvotes: 0
Views: 35
Reputation: 402263
Setup
df = pd.DataFrame({
'time_frame': {0: nan, 1: nan, 2: nan, 3: 'denied', 4: nan, 5: nan, 6: nan},
'time_range': {0: 0, 1: 10, 2: 120, 3: 10, 4: 50, 5: 175, 6: 250}})
df
time_range time_frame
0 0 NaN
1 10 NaN
2 120 NaN
3 10 denied
4 50 NaN
5 175 NaN
6 250 NaN
Use pd.cut
and mask those rows where "time_frame" is "denied":
bins = [-np.inf, 0, 29, 59, 89, 119, 149, 179, np.inf]
labels = [
'instant', '1-29 days', '30-59 days', '60-89 days',
'90-119 days', '120-149 days', '150-179 days', '180+ days']
df['time_frame'] = (
pd.cut(df['time_range'], bins=bins, labels=labels, right=True)
.where(df['time_frame'].ne('denied'), 'denied'))
print(df)
time_range time_frame
0 0 instant
1 10 1-29 days
2 120 120-149 days
3 10 denied
4 50 30-59 days
5 175 150-179 days
6 250 180+ days
Upvotes: 2