Reputation: 273
I'm trying to iterate over two columns in a dataframe and create a dummy column for a statsmodel analysis if the client has always renewed their contract, by looking for contracts from this year (data.Year_Season == 2014-2015
) and that the client had renewed more than once (data.Rank_ouput > 1
). See the code below:
def make_always_renewed_column(data):
for i, row in data.iterrows():
if row.Year_Season and row.Rank_output > 1:
return 1
else:
return 0
data['alwaysRenewed'] = make_always_renewed_column(data)
But when I look at what was returned with:
data.groupby(['alwaysRenewed'])[['lead_id']].count()
All rows in the new column returned 0.
I tried this on one row that met the conditions with .iloc
and it returned True
.
Any ideas?
Just tried it like this to no avail:
def make_always_renewed_column(data):
for row in data.itertuples():
if row[8] == '2014-2015' and row[10] > 1:
return 1
else:
return 0
Upvotes: 2
Views: 2776
Reputation: 60140
There's no need to loop through individual rows to do these types of tests. Operations like +
, -
, ==
etc. on pandas columns are vectorised, i.e. they are automatically applied to each element of the column. Your test should just look like:
data['alwaysRenewed'] = (data['Year_Season'] == '2014-2015') & (data['Rank_output'] > 1)
This will create a boolean column, i.e. a column of True
/False
values. These will act like 0/1 for the purposes of sums, means etc., but you can convert to 0/1 explicitly using:
data['alwaysRenewed'] = data['alwaysRenewed'].astype(int)
Upvotes: 2