user10643490
user10643490

Reputation:

Why isn't my inner loop working correctly?

I'm relatively new with the Pandas data frames. I have the following code and For nested loop.

The problem arises when the loop hits the inner loop.

import pandas as pd 

### Create Data Frames:
Patients = {'Name': ['Jordan','Jess', 'Jake', 'Alice', 'Alan', 'Lauren'], 'Age': [26,23,19,20,24,28],'Sex': ['M', 'F', 'M', 'F', 'M', 'F'],
            'BMI': [26,22,24,17,35,20],'Smokes': ['No', 'No', 'Yes', 'No', 'Yes', 'No']}

pdf = pd.DataFrame(Patients)
print(pdf) ## DF printed out completely 

## 
i = 0

for Smokes in pdf.Smokes:
    if Smokes == 'Yes':
        pdf.at[i, 'Risk'] = 'high'
    else: 
       pdf.at[i, 'Risk'] = ' '
       for BMI in pdf.BMI:
           if BMI >= 30 or BMI <= 19:
               pdf.at[i, 'Risk'] = 'high'
           elif BMI >= 25 and BMI <=29:
               pdf.at[i, 'Risk'] = 'medium'
           else: 
               pdf.at[i, 'Risk'] = 'Low'
    i +=1 #

However, when I print out the pdf again, it shows:

   Name  Age Sex  BMI Smokes  Risk
0  Jordan   26   M   26     No   Low
1    Jess   23   F   22     No   Low
2    Jake   19   M   24    Yes  high
3   Alice   20   F   17     No   Low
4    Alan   24   M   35    Yes  high
5  Lauren   28   F   20     No   Low

Jordan should be a medium risk and Alice should be low risk patient but the inner loop is not recognizing that. However, when I run the BMI loop separately, it recognizes it.

Upvotes: 1

Views: 92

Answers (7)

Steve
Steve

Reputation: 483

Often when using pandas dataframes, there are efficient ways to complete tasks without using for loops. In your case, you can define a function that returns the 'Risk' value string and apply it across the columns of each row to set the desired new column:

import pandas as pd 

# create dataframe
Patients = {'Name': ['Jordan','Jess', 'Jake', 'Alice', 'Alan', 'Lauren'], 'Age': [26,23,19,20,24,28],'Sex': ['M', 'F', 'M', 'F', 'M', 'F'],
            'BMI': [26,22,24,17,35,20],'Smokes': ['No', 'No', 'Yes', 'No', 'Yes', 'No']}
pdf = pd.DataFrame(Patients)
# inspect dataframe
print(pdf)

# define the function that you want to apply
def get_risk(series):

    if series.Smokes == 'Yes':
        return 'high'    
    else:
        if series.BMI >= 30 or series.BMI <= 19:
            return 'high'
        elif series.BMI >= 25 and series.BMI <= 29:
            return 'medium'
        else:
            return 'low'

# apply the function across the columns of the dataframe (sending each row to the function as a series)
pdf['Risk'] = pdf.apply(get_risk, axis='columns')

# inspect the results
print(pdf)

Upvotes: 1

kederrac
kederrac

Reputation: 17322

in the first else statement you have a for loop that is evaluating again all the Risk values you should use your code without the inner for loop:

if Smokes == 'Yes':
    pdf.at[i, 'Risk'] = 'high'
elif pdf.at[i, 'BMI'] >= 30 or pdf.at[i, 'BMI'] <= 19:
       pdf.at[i, 'Risk'] = 'high'
elif pdf.at[i, 'BMI'] >= 25 and pdf.at[i, 'BMI'] <=29:
       pdf.at[i, 'Risk'] = 'medium'
else: 
    pdf.at[i, 'Risk'] = 'Low'

with these small changes in your code pdf will be:

enter image description here

Upvotes: 1

Mahsa Hassankashi
Mahsa Hassankashi

Reputation: 2139

Use it instead of for loop, it is very clean and fast:

import numpy as np
pdf['Risk'] = np.where(pdf['Smokes']=='Yes', 'High','none')

pdf['Risk'] = np.where(np.logical_and(pdf['Smokes']=='No', pdf['Age']==26), 'medium','none')

Upvotes: 0

ALollz
ALollz

Reputation: 59529

Don't loop. Use np.select to create the hierarchy of conditions and the corresponding choices and assign the correct value. It gives precedence to the first True found in conditions so we order it 'high', 'medium', 'low'.

import numpy as np

conditions = [df['Smokes'].eq('Yes') | df['BMI'].ge(30) | df['BMI'].le(19),  # high
              df['BMI'].between(25, 30)]                                     # medium
choice_list = ['high', 'medium']

df['Risk'] = np.select(conditions, choice_list, default='low')

     Name  Age Sex  BMI Smokes    Risk
0  Jordan   26   M   26     No  medium
1    Jess   23   F   22     No     low
2    Jake   19   M   24    Yes    high
3   Alice   20   F   17     No    high
4    Alan   24   M   35    Yes    high
5  Lauren   28   F   20     No     low

Upvotes: 2

Mehul Gupta
Mehul Gupta

Reputation: 1939

import pandas as pd 

### Create Data Frames:
Patients = {'Name': ['Jordan','Jess', 'Jake', 'Alice', 'Alan', 'Lauren'], 'Age': [26,23,19,20,24,28],'Sex': ['M', 'F', 'M', 'F', 'M', 'F'],
        'BMI': [26,22,24,17,35,20],'Smokes': ['No', 'No', 'Yes', 'No', 'Yes', 'No']}

pdf = pd.DataFrame(Patients)

risk = []
for index,rows in pdf.iterrows():
    if rows['Smokes'] == 'Yes':
          risk.append('high')
    else: 
       BMI = rows['BMI']
       if BMI >= 30 or BMI <= 19:
             risk.append('high')
       elif BMI >= 25 and BMI <=29:
             risk.append('medium')
       else: 
             risk.append('Low') 
pdf['risk']=risk

Explanation: iterrows() helps to iterate over each row of the dataframe. 'rows' will contain all the values. P.S Alice will be on High risk given your condition as BMI is lower than 19. I have considered a separate list 'risk' where I am appending values for each row & finally adding this column in the dataframe

Upvotes: 0

NomadMonad
NomadMonad

Reputation: 649

Might be worth refactoring your code into something like

def get_risk(row):
    if row['Smokes'] == 'Yes':
        return 'high'
    elif row['BMI'] >= 30 or row['BMI'] <= 19:
        return 'high'
    elif row['BMI'] >= 25 and row['BMI'] <=29:
        return 'medium'
    else:
        return 'low'

pdf['Risk'] = pdf.apply(get_risk, axis=1)

I am not sure whether the logic in your risk calculation gives you what you expect though. I copied it verbatim from your example.

Upvotes: 1

David Smolinski
David Smolinski

Reputation: 534

Your inner for loop finishes and sets pdf.at[i, 'Risk'] = 'Low' because the last BMI (20) is good. This happens on every loop of the outer loop.

Upvotes: 0

Related Questions