Reputation: 573
This is one of the lines in my code where I get the SettingWithCopyWarning
:
value1['Total Population']=value1['Total Population'].replace(to_replace='*', value=4)
Which I then changed to :
row_index= value1['Total Population']=='*'
value1.loc[row_index,'Total Population'] = 4
This still gives the same warning. How do I get rid of it?
Also, I get the same warning for a convert_objects(convert_numeric=True) function that I've used, is there any way to avoid that.
value1['Total Population'] = value1['Total Population'].astype(str).convert_objects(convert_numeric=True)
This is the warning message that I get:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Upvotes: 40
Views: 84905
Reputation: 319
It is a warning about whether or not the source df is updated in replica update using sliced index.
If replica update, then try adding pd.set_option('mode.chained_assignment', None)
before the line where the warning is raised
df_value = pd.DataFrame({ 'Total Population':['a','b','c','*'] })
value1 = df_value[ df_value['Total Population']=='*']
pd.set_option('mode.chained_assignment', None) # <=== SettingWithCopyWarning Off
row_index = value1['Total Population']=='*'
value1.loc[row_index,'Total Population'] = 44
pd.set_option('mode.chained_assignment', 'warn') # <=== SettingWithCopyWarning Default
Upvotes: 2
Reputation: 4600
If you use .loc[row, column]
and still get the same error, it's probably because of copying another dataframe. You have to use .copy()
.
This is a step-by-step error reproduction:
import pandas as pd
d = {'col1': [1, 2, 3, 4], 'col2': [3, 4, 5, 6]}
df = pd.DataFrame(data=d)
df
# col1 col2
#0 1 3
#1 2 4
#2 3 5
#3 4 6
Creating a new column and updating its value:
df['new_column'] = None
df.loc[0, 'new_column'] = 100
df
# col1 col2 new_column
#0 1 3 100
#1 2 4 None
#2 3 5 None
#3 4 6 None
No error I receive. But, let's create another dataframe given the previous one:
new_df = df.loc[df.col1>2]
new_df
#col1 col2 new_column
#2 3 5 None
#3 4 6 None
Now, using .loc
, I will try to replace some values in the same manner:
new_df.loc[2, 'new_column'] = 100
However, I got this hateful warning again:
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
SOLUTION
use .copy()
while creating the new dataframe will solve the warning:
new_df_copy = df.loc[df.col1>2].copy()
new_df_copy.loc[2, 'new_column'] = 100
Now, you won't receive any warnings!
If your dataframe is created using a filter on top of another dataframe, always use .copy()
.
Upvotes: 51
Reputation: 317
This should fix your problem :
value1[:, 'Total Population'] = value1[:, 'Total Population'].replace(to_replace='*', value=4)
Upvotes: 0
Reputation: 53
Specifying it is a copy worked for me. I just added .copy()
at the end of the statement
value1['Total Population'] = value1['Total Population'].replace(to_replace='*', value=4).copy()
Upvotes: 0
Reputation: 3415
I came here because I wanted to conditionally set the value of a new column based on the value in another column.
What worked for me was numpy.where:
import numpy as np
import pandas as pd
...
df['Size'] = np.where((df.value > 10), "Greater than 10", df.value)
From numpy docs, this is equivelant to:
[xv if c else yv
for c, xv, yv in zip(condition, x, y)]
Which is a pretty nice use of zip...
Upvotes: 3
Reputation: 3855
I have no idea how bad the data storage/memory implications are with this but it fixes it every time for your average dataframe:
def addCrazyColFunc(df):
dfNew = df.copy()
dfNew['newCol'] = 'crazy'
return dfNew
Just like the message says... make a copy and you're good to go. Please if someone can fix the above without the copy, please comment. All the above loc stuff doesn't work for this case.
Upvotes: 3
Reputation: 106
I was able to avoid the same warning message with syntax like this:
value1.loc[:, 'Total Population'].replace('*', 4)
Note that the dataframe doesn't need to be re-assigned to itself, i.e. value1['Total Population']=value1['Total Population']...
Upvotes: 1
Reputation: 573
Got the solution:
I created a new DataFrame and stored the value of only the columns that I needed to work on, it gives me no errors now!
Strange, but worked.
Upvotes: 0
Reputation: 109546
Have you tried setting directly?:
value1.loc[value1['Total Population'] == '*', 'Total Population'] = 4
Upvotes: 11