My Work
My Work

Reputation: 2508

Change value in pandas after chained loc and iloc

I have the following problem: in a df, I want to select specific rows and a specific column and in this selection take the first n elements and assign a new value to them. Naively, I thought that the following code should do the job:

import seaborn as sns
import pandas as pd

df = sns.load_dataset('tips')
df.loc[df.day=="Sun", "smoker"].iloc[:4] = "Yes"

Both of the loc and iloc should return a view into the df and the value should be overwritten. However, the dataframe does not change. Why?

I know how to go around it -- creating a new df first just with the loc, then changing the value using iloc and updating back the original df (as below).

But a) I do not think it's optimal, and b) I would like to know why the top solution does not work. Why does it return a copy and not a view of a view?

The alternative solution:

df = sns.load_dataset('tips')
tmp = df.loc[df.day=="Sun", "smoker"]
tmp.iloc[:4] = "Yes"
df.loc[df.day=="Sun", "smoker"] = tmp

Note: I have read the docs, this really great post and this question but they don't explain this. Their concern is the difference between df.loc[mask,"z] and the chained df["z"][mask].

Upvotes: 6

Views: 4538

Answers (2)

Alexander Volkovsky
Alexander Volkovsky

Reputation: 2918

I believe df.loc[].iloc[] is a chained assignment case and pandas doesn't guarantee that you will get a view at the end. From the docs:

Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment and should be avoided.

Since you have a filtering condition in loc, pandas will create a new pd.Series and than will apply an assignment to it. For example the following will work because you'll get the same series as df["smoker"]:

df.loc[:, "smoker"].iloc[:4] = 'Yes'

But you will get SettingWithCopyWarning warning.

You need to rewrite your code so that pandas handles this as a single loc entity.

Another possible workaround:

df.loc[df[df.day=="Sun"].index[:4], "smoker"] = 'Yes'

Upvotes: 5

Samir Hinojosa
Samir Hinojosa

Reputation: 825

In your case, you can define the columns to impute

Let's suppose the following dataset

df = pd.DataFrame(data={'State':[1,2,3,4,5,6, 7, 8, 9, 10], 
                         'Sno Center': ["Guntur", "Nellore", "Visakhapatnam", "Biswanath", "Nellore", "Guwahati", "Nellore", "Numaligarh", "Sibsagar", "Munger-Jamalpu"], 
                         'Mar-21': [121, 118.8, 131.6, 123.7, 127.8, 125.9, 114.2, 114.2, 117.7, 117.7],
                         'Apr-21': [121.1, 118.3, 131.5, 124.5, 128.2, 128.2, 115.4, 115.1, 117.3, 118.3]})
df
    State   Sno Center      Mar-21  Apr-21
0   1       Guntur          121.0   121.1
1   2       Nellore         118.8   118.3
2   3       Visakhapatnam   131.6   131.5
3   4       Biswanath       123.7   124.5
4   5       Nellore         127.8   128.2
5   6       Guwahati        125.9   128.2
6   7       Nellore         114.2   115.4
7   8       Numaligarh      114.2   115.1
8   9       Sibsagar        117.7   117.3
9   10      Munger-Jamalpu  117.7   118.3

So, I would like to change to 0 all dates where Sno Center is equals to Nellore

mask = df["Sno Center"] == "Nellore"
df.loc[mask, ["Mar-21", "Apr-21"]] = 0

The result

df
State   Sno Center      Mar-21  Apr-21
0   1   Guntur          121.0   121.1
1   2   Nellore         0.0     0.0
2   3   Visakhapatnam   131.6   131.5
3   4   Biswanath       123.7   124.5
4   5   Nellore         0.0     0.0
5   6   Guwahati        125.9   128.2
6   7   Nellore         0.0     0.0
7   8   Numaligarh      114.2   115.1
8   9   Sibsagar        117.7   117.3
9   10  Munger-Jamalpu  117.7   118.3

Other option is to define the columns as a list

COLS = ["Mar-21", "Apr-21"]
df.loc[mask, COLS] = 0

Other options using iloc

COLS = df.iloc[:, 2:4].columns.tolist()
df.loc[mask, COLS] = 0

Or

df.loc[mask, df.iloc[:, 2:4].columns.tolist()] = 0

Upvotes: 0

Related Questions