Reputation: 648
This doesn't work:
def rator(row):
if row['country'] == 'Canada':
row['stars'] = 3
elif row['points'] >= 95:
row['stars'] = 3
elif row['points'] >= 85:
row['stars'] = 2
else:
row['stars'] = 1
return row
with_stars = reviews.apply(rator, axis='columns')
But this works:
def rator(row):
if row['country'] == 'Canada':
return 3
elif row['points'] >= 95:
return 3
elif row['points'] >= 85:
return 2
else:
return 1
with_stars = reviews.apply(rator, axis='columns')
I'm practicing on Kaggle, and reading through their tutorial as well as the documentation. I am a bit confused by the concept.
I understand that the apply()
method acts on an entire row of a DataFrame, while map()
acts on each element in a column. And that it's supposed to return a DataFrame, while map()
returns a Series.
Just not sure how the mechanics work here, since it's not letting me return rows inside the function...
some of the data:
country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco -1.447138 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos -1.447138 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
Index(['country', 'description', 'designation', 'points', 'price', 'province',
'region_1', 'region_2', 'taster_name', 'taster_twitter_handle', 'title',
'variety', 'winery'],
dtype='object')
https://www.kaggle.com/residentmario/summary-functions-and-maps
Upvotes: 1
Views: 143
Reputation: 3591
You shouldn't use apply
with a function that modifies the input. You could change your code to this:
def rator(row):
new_row = row.copy()
if row['country'] == 'Canada':
new_row['stars'] = 3
elif row['points'] >= 95:
new_row['stars'] = 3
elif row['points'] >= 85:
new_row['stars'] = 2
else:
new_row['stars'] = 1
return new_row
with_stars = reviews.apply(rator, axis='columns')
However, it's simpler to just return the column you care about rather than returning an entire dataframe just to change one column. If you write rator
to return just one column, but you want to have an entire dataframe, you can do with_stars = reviews.copy()
and then with_stars['stars'] = reviews.apply(rator, axis='columns')
. Also, if an if
branch ends with a return, you can do just if
after it rather than elif
. You can also simplify your code with cut
.
Upvotes: 0
Reputation: 1533
When you use apply
, the function is applied iteratively to each row (or column, depending on the axis
parameter). The return value of apply
is not a DataFrame
but a Series
built using the return values of your function. That means that your second piece of code returns the stars rating of each row, which is used to build a new Series
. So a better name for storing the return value is star_ratings
instead of with_stars
.
If you want to append this Series
to your original dataframe you can use:
star_ratings = reviews.apply(rator, axis='columns')
reviews['stars'] = star_ratings
or, more succinctly:
reviews['stars'] = reviews.apply(rator, axis='columns')
As for why your first piece of code does not work, it is because you are trying to add a new column: your are not supposed to mutate the passed object. The official docs state:
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported
To better understand the differences between map
and apply
please see the different responses to this question, as they present many different and correct viewpoints.
Upvotes: 1