Iterating through two columns in a pandas dataframe

Question

I'm trying to iterate over two columns in a dataframe and create a dummy column for a statsmodel analysis if the client has always renewed their contract, by looking for contracts from this year (data.Year_Season == 2014-2015) and that the client had renewed more than once (data.Rank_ouput > 1). See the code below:

def make_always_renewed_column(data):
    for i, row in data.iterrows():  
        if row.Year_Season and row.Rank_output > 1:
            return 1
        else:
            return 0 


data['alwaysRenewed'] = make_always_renewed_column(data)

But when I look at what was returned with:

data.groupby(['alwaysRenewed'])[['lead_id']].count()

All rows in the new column returned 0.

I tried this on one row that met the conditions with .iloc and it returned True.

Any ideas?

Update

Just tried it like this to no avail:

def make_always_renewed_column(data):
for row in data.itertuples():
    if row[8] == '2014-2015' and row[10] > 1:
        return 1
    else:
        return 0

Marius · Accepted Answer

There's no need to loop through individual rows to do these types of tests. Operations like +, -, == etc. on pandas columns are vectorised, i.e. they are automatically applied to each element of the column. Your test should just look like:

data['alwaysRenewed'] = (data['Year_Season'] == '2014-2015') & (data['Rank_output'] > 1)

This will create a boolean column, i.e. a column of True/False values. These will act like 0/1 for the purposes of sums, means etc., but you can convert to 0/1 explicitly using:

data['alwaysRenewed'] = data['alwaysRenewed'].astype(int)

Iterating through two columns in a pandas dataframe

Update

Answers (1)

Related Questions