Reputation:
I'm relatively new with the Pandas data frames. I have the following code and For nested loop.
The problem arises when the loop hits the inner loop.
import pandas as pd
### Create Data Frames:
Patients = {'Name': ['Jordan','Jess', 'Jake', 'Alice', 'Alan', 'Lauren'], 'Age': [26,23,19,20,24,28],'Sex': ['M', 'F', 'M', 'F', 'M', 'F'],
'BMI': [26,22,24,17,35,20],'Smokes': ['No', 'No', 'Yes', 'No', 'Yes', 'No']}
pdf = pd.DataFrame(Patients)
print(pdf) ## DF printed out completely
##
i = 0
for Smokes in pdf.Smokes:
if Smokes == 'Yes':
pdf.at[i, 'Risk'] = 'high'
else:
pdf.at[i, 'Risk'] = ' '
for BMI in pdf.BMI:
if BMI >= 30 or BMI <= 19:
pdf.at[i, 'Risk'] = 'high'
elif BMI >= 25 and BMI <=29:
pdf.at[i, 'Risk'] = 'medium'
else:
pdf.at[i, 'Risk'] = 'Low'
i +=1 #
However, when I print out the pdf again, it shows:
Name Age Sex BMI Smokes Risk
0 Jordan 26 M 26 No Low
1 Jess 23 F 22 No Low
2 Jake 19 M 24 Yes high
3 Alice 20 F 17 No Low
4 Alan 24 M 35 Yes high
5 Lauren 28 F 20 No Low
Jordan should be a medium risk and Alice should be low risk patient but the inner loop is not recognizing that. However, when I run the BMI loop separately, it recognizes it.
Upvotes: 1
Views: 92
Reputation: 483
Often when using pandas dataframes, there are efficient ways to complete tasks without using for
loops. In your case, you can define a function that returns the 'Risk' value string and apply
it across the columns of each row to set the desired new column:
import pandas as pd
# create dataframe
Patients = {'Name': ['Jordan','Jess', 'Jake', 'Alice', 'Alan', 'Lauren'], 'Age': [26,23,19,20,24,28],'Sex': ['M', 'F', 'M', 'F', 'M', 'F'],
'BMI': [26,22,24,17,35,20],'Smokes': ['No', 'No', 'Yes', 'No', 'Yes', 'No']}
pdf = pd.DataFrame(Patients)
# inspect dataframe
print(pdf)
# define the function that you want to apply
def get_risk(series):
if series.Smokes == 'Yes':
return 'high'
else:
if series.BMI >= 30 or series.BMI <= 19:
return 'high'
elif series.BMI >= 25 and series.BMI <= 29:
return 'medium'
else:
return 'low'
# apply the function across the columns of the dataframe (sending each row to the function as a series)
pdf['Risk'] = pdf.apply(get_risk, axis='columns')
# inspect the results
print(pdf)
Upvotes: 1
Reputation: 17322
in the first else
statement you have a for
loop that is evaluating again all the Risk
values you should use your code without the inner for
loop:
if Smokes == 'Yes':
pdf.at[i, 'Risk'] = 'high'
elif pdf.at[i, 'BMI'] >= 30 or pdf.at[i, 'BMI'] <= 19:
pdf.at[i, 'Risk'] = 'high'
elif pdf.at[i, 'BMI'] >= 25 and pdf.at[i, 'BMI'] <=29:
pdf.at[i, 'Risk'] = 'medium'
else:
pdf.at[i, 'Risk'] = 'Low'
with these small changes in your code pdf will be:
Upvotes: 1
Reputation: 2139
Use it instead of for loop, it is very clean and fast:
import numpy as np
pdf['Risk'] = np.where(pdf['Smokes']=='Yes', 'High','none')
pdf['Risk'] = np.where(np.logical_and(pdf['Smokes']=='No', pdf['Age']==26), 'medium','none')
Upvotes: 0
Reputation: 59529
Don't loop. Use np.select
to create the hierarchy of conditions and the corresponding choices and assign the correct value. It gives precedence to the first True
found in conditions so we order it 'high'
, 'medium'
, 'low'
.
import numpy as np
conditions = [df['Smokes'].eq('Yes') | df['BMI'].ge(30) | df['BMI'].le(19), # high
df['BMI'].between(25, 30)] # medium
choice_list = ['high', 'medium']
df['Risk'] = np.select(conditions, choice_list, default='low')
Name Age Sex BMI Smokes Risk
0 Jordan 26 M 26 No medium
1 Jess 23 F 22 No low
2 Jake 19 M 24 Yes high
3 Alice 20 F 17 No high
4 Alan 24 M 35 Yes high
5 Lauren 28 F 20 No low
Upvotes: 2
Reputation: 1939
import pandas as pd
### Create Data Frames:
Patients = {'Name': ['Jordan','Jess', 'Jake', 'Alice', 'Alan', 'Lauren'], 'Age': [26,23,19,20,24,28],'Sex': ['M', 'F', 'M', 'F', 'M', 'F'],
'BMI': [26,22,24,17,35,20],'Smokes': ['No', 'No', 'Yes', 'No', 'Yes', 'No']}
pdf = pd.DataFrame(Patients)
risk = []
for index,rows in pdf.iterrows():
if rows['Smokes'] == 'Yes':
risk.append('high')
else:
BMI = rows['BMI']
if BMI >= 30 or BMI <= 19:
risk.append('high')
elif BMI >= 25 and BMI <=29:
risk.append('medium')
else:
risk.append('Low')
pdf['risk']=risk
Explanation: iterrows() helps to iterate over each row of the dataframe. 'rows' will contain all the values. P.S Alice will be on High risk given your condition as BMI is lower than 19. I have considered a separate list 'risk' where I am appending values for each row & finally adding this column in the dataframe
Upvotes: 0
Reputation: 649
Might be worth refactoring your code into something like
def get_risk(row):
if row['Smokes'] == 'Yes':
return 'high'
elif row['BMI'] >= 30 or row['BMI'] <= 19:
return 'high'
elif row['BMI'] >= 25 and row['BMI'] <=29:
return 'medium'
else:
return 'low'
pdf['Risk'] = pdf.apply(get_risk, axis=1)
I am not sure whether the logic in your risk calculation gives you what you expect though. I copied it verbatim from your example.
Upvotes: 1
Reputation: 534
Your inner for loop finishes and sets pdf.at[i, 'Risk'] = 'Low'
because the last BMI (20) is good. This happens on every loop of the outer loop.
Upvotes: 0