Reputation: 2508
I have the following problem: in a df, I want to select specific rows and a specific column and in this selection take the first n
elements and assign a new value to them. Naively, I thought that the following code should do the job:
import seaborn as sns
import pandas as pd
df = sns.load_dataset('tips')
df.loc[df.day=="Sun", "smoker"].iloc[:4] = "Yes"
Both of the loc
and iloc
should return a view into the df and the value should be overwritten. However, the dataframe does not change. Why?
I know how to go around it -- creating a new df first just with the loc
, then changing the value using iloc
and updating back the original df (as below).
But a) I do not think it's optimal, and b) I would like to know why the top solution does not work. Why does it return a copy and not a view of a view?
The alternative solution:
df = sns.load_dataset('tips')
tmp = df.loc[df.day=="Sun", "smoker"]
tmp.iloc[:4] = "Yes"
df.loc[df.day=="Sun", "smoker"] = tmp
Note: I have read the docs, this really great post and this question but they don't explain this. Their concern is the difference between df.loc[mask,"z]
and the chained df["z"][mask]
.
Upvotes: 6
Views: 4538
Reputation: 2918
I believe df.loc[].iloc[]
is a chained assignment case and pandas doesn't guarantee that you will get a view at the end. From the docs:
Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment and should be avoided.
Since you have a filtering condition in loc
, pandas will create a new pd.Series
and than will apply an assignment to it. For example the following will work because you'll get the same series as df["smoker"]
:
df.loc[:, "smoker"].iloc[:4] = 'Yes'
But you will get SettingWithCopyWarning
warning.
You need to rewrite your code so that pandas handles this as a single loc
entity.
Another possible workaround:
df.loc[df[df.day=="Sun"].index[:4], "smoker"] = 'Yes'
Upvotes: 5
Reputation: 825
In your case, you can define the columns to impute
Let's suppose the following dataset
df = pd.DataFrame(data={'State':[1,2,3,4,5,6, 7, 8, 9, 10],
'Sno Center': ["Guntur", "Nellore", "Visakhapatnam", "Biswanath", "Nellore", "Guwahati", "Nellore", "Numaligarh", "Sibsagar", "Munger-Jamalpu"],
'Mar-21': [121, 118.8, 131.6, 123.7, 127.8, 125.9, 114.2, 114.2, 117.7, 117.7],
'Apr-21': [121.1, 118.3, 131.5, 124.5, 128.2, 128.2, 115.4, 115.1, 117.3, 118.3]})
df
State Sno Center Mar-21 Apr-21
0 1 Guntur 121.0 121.1
1 2 Nellore 118.8 118.3
2 3 Visakhapatnam 131.6 131.5
3 4 Biswanath 123.7 124.5
4 5 Nellore 127.8 128.2
5 6 Guwahati 125.9 128.2
6 7 Nellore 114.2 115.4
7 8 Numaligarh 114.2 115.1
8 9 Sibsagar 117.7 117.3
9 10 Munger-Jamalpu 117.7 118.3
So, I would like to change to 0 all dates where Sno Center
is equals to Nellore
mask = df["Sno Center"] == "Nellore"
df.loc[mask, ["Mar-21", "Apr-21"]] = 0
The result
df
State Sno Center Mar-21 Apr-21
0 1 Guntur 121.0 121.1
1 2 Nellore 0.0 0.0
2 3 Visakhapatnam 131.6 131.5
3 4 Biswanath 123.7 124.5
4 5 Nellore 0.0 0.0
5 6 Guwahati 125.9 128.2
6 7 Nellore 0.0 0.0
7 8 Numaligarh 114.2 115.1
8 9 Sibsagar 117.7 117.3
9 10 Munger-Jamalpu 117.7 118.3
Other option is to define the columns as a list
COLS = ["Mar-21", "Apr-21"]
df.loc[mask, COLS] = 0
Other options using iloc
COLS = df.iloc[:, 2:4].columns.tolist()
df.loc[mask, COLS] = 0
Or
df.loc[mask, df.iloc[:, 2:4].columns.tolist()] = 0
Upvotes: 0