SettingWithCopyWarning message in Pandas/Python with df.loc

Question

OBS: I've spent a few hours searching in SO, Pandas docs and a few others websites, but couldnt understand where my code isnt working.

My UDF:

def indice(dfb, lb, ub):
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
    dfb = dfb[~dfb.isOutlier]

    dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
    df = df.astype({'indice': 'int64'})
    return dfb

Important:

isOutlier column does not exist. I'm creating it right now in this function.
indice column does not exist. I'm creating it right now in this function.
valor_unitario exists and its a float
lb and ub are previously defined
This function is inside a loop in the main code (but this warning is raised since n=0)

Warning raised

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

I found a few articles and questions on web and also StackOverflow saying that using loc would solve the problem. I tried but with no success

1º try - Using loc

def indice(dfb, lb, ub):
->  dfb.loc[:,'isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
    dfb = dfb[~dfb.isOutlier]

->  dfb.loc[:,'indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
    df = df.astype({'indice': 'int64'})
    return dfb

I also tried to use loc each one each time actually, I tried a lot of possible combinations... Tried to use df.loc in dfb['valor_unitario'] and so on

Now I have the same warning, twice, but a bit different:

self._setitem_single_column(ilocs[0], value, pi) and self.obj[key] = value

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1676: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self._setitem_single_column(ilocs[0], value, pi)

and

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self.obj[key] = value

I also tried using copy. At first time this warning shown up, simple using copy() solved the problem, I dont know why now its not working (I just loaded more data)

2º Try - Using copy()

I tried to place copy() in three places, with no sucess

dfb = dfb[~dfb.isOutlier].copy()

dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub).copy()

dfb['isOutlier'] = ~dfb['valor_unitario'].copy().between(lb, ub)

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

I have no more ideas, would appreciate a lot your support.

------- Minimun Reproducible Example --------

Main_testing.py

import pandas as pd
import calculoindice_support as indice # module 01
import getitemsid_support as getitems # module 02

df = pd.DataFrame({'loja':[1,4,6,6,4,5,7,8],
                   'cod_produto':[21,21,21,55,55,43,26,30],
                   'valor_unitario':[332.21,333.40,333.39,220.40,220.40,104.66,65.00,14.00],
                   'documento':['324234','434144','532552','524523','524525','423844','529585','239484'],
                   'empresa':['ABC','ABC','ABC','ABC','ABC','CDE','CDE','CDE']
                   })

nome_coluna = 'cod_produto'
# getting items id to loop over them
product_ids = getitems.getitemsid(df, nome_coluna)

# initializing main DF with no data 
df_nf = pd.DataFrame(columns=list(df.columns.values))

n = 0
while n < len(product_ids):
    item = product_ids[n]
    df_item = df[df[nome_coluna] == item]
    # assigning bounds to each variable
    lb, ub = indice.limites(df_item, 10)
    # calculating index over DF, using LB and UB
    # creating temporary (for each loop) DF
    df_nf_aux = indice.indice(df_item, lb, ub)
    # assigning temporary DF to main DF that will be exported later
    df_nf = pd.concat([df_nf, df_nf_aux],ignore_index=True)
    n += 1

calculoindice_support.py (module 01)

import pandas as pd

def limites(dfa,n):
    n_sigma = n * dfa.valor_unitario.std()
    mean = dfa.valor_unitario.mean()
    lb: float = mean - n_sigma
    ub: float = mean + n_sigma
    return (lb, ub)


def indice(dfb, lb, ub):
    if lb == ub:
        dfb.loc[:, 'isOutlier'] = False
        dfb.loc[:, 'indice'] = 1
    else:
        dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
        dfb = dfb[~dfb.isOutlier]

        dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
        # df = df.astype({'indice': 'int64'})

    return dfb

getitemsid_support.py (module 02)

def getitemsid(df, coluna):
    a = df[coluna].tolist()
    return list(set(a))

Warning output:

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

Ynjxsjmh · Accepted Answer

Problem is in your Main_testing.py

while n < len(product_ids):
    df_item = df[df[nome_coluna] == item]

    df_nf_aux = indice.indice(df_item, lb, ub)

First you slice your df with condition df[nome_coluna] == item，this will return a copy of dataframe(You can check this by accessing _is_view or _is_copy attribute). Then you pass that filtered dataframe to indice method.

def indice(dfb, lb, ub):
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

In indice method, you assign a new column to the filtered dataframe. This is an implicit chained assignment. Pandas don't know if you want to add the new column to the original dataframe or only add to the filtered dataframe, so pandas gives you a warning.

To suppress this warning, you can explicitly tell pandas what you want to do

def indice(dfb, lb, ub):
    dfb = dfb.copy()
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

In the above case, I create a copy of filtered dataframe. This means I would like to add the new column to the filtered dataframe not original.