Reputation: 141
OBS: I've spent a few hours searching in SO, Pandas docs and a few others websites, but couldnt understand where my code isnt working.
def indice(dfb, lb, ub):
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
df = df.astype({'indice': 'int64'})
return dfb
isOutlier
column does not exist. I'm creating it right now in this function.indice
column does not exist. I'm creating it right now in this function.valor_unitario
exists and its a floatlb
and ub
are previously definedC:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
I found a few articles and questions on web and also StackOverflow saying that using loc
would solve the problem. I tried but with no success
def indice(dfb, lb, ub):
-> dfb.loc[:,'isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
-> dfb.loc[:,'indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
df = df.astype({'indice': 'int64'})
return dfb
I also tried to use loc each one each time actually, I tried a lot of possible combinations... Tried to use df.loc
in dfb['valor_unitario']
and so on
Now I have the same warning, twice, but a bit different:
self._setitem_single_column(ilocs[0], value, pi)
and
self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1676: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self._setitem_single_column(ilocs[0], value, pi)
and
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self.obj[key] = value
I also tried using copy. At first time this warning shown up, simple using copy()
solved the problem, I dont know why now its not working (I just loaded more data)
I tried to place copy()
in three places, with no sucess
dfb = dfb[~dfb.isOutlier].copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub).copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].copy().between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
I have no more ideas, would appreciate a lot your support.
import pandas as pd
import calculoindice_support as indice # module 01
import getitemsid_support as getitems # module 02
df = pd.DataFrame({'loja':[1,4,6,6,4,5,7,8],
'cod_produto':[21,21,21,55,55,43,26,30],
'valor_unitario':[332.21,333.40,333.39,220.40,220.40,104.66,65.00,14.00],
'documento':['324234','434144','532552','524523','524525','423844','529585','239484'],
'empresa':['ABC','ABC','ABC','ABC','ABC','CDE','CDE','CDE']
})
nome_coluna = 'cod_produto'
# getting items id to loop over them
product_ids = getitems.getitemsid(df, nome_coluna)
# initializing main DF with no data
df_nf = pd.DataFrame(columns=list(df.columns.values))
n = 0
while n < len(product_ids):
item = product_ids[n]
df_item = df[df[nome_coluna] == item]
# assigning bounds to each variable
lb, ub = indice.limites(df_item, 10)
# calculating index over DF, using LB and UB
# creating temporary (for each loop) DF
df_nf_aux = indice.indice(df_item, lb, ub)
# assigning temporary DF to main DF that will be exported later
df_nf = pd.concat([df_nf, df_nf_aux],ignore_index=True)
n += 1
import pandas as pd
def limites(dfa,n):
n_sigma = n * dfa.valor_unitario.std()
mean = dfa.valor_unitario.mean()
lb: float = mean - n_sigma
ub: float = mean + n_sigma
return (lb, ub)
def indice(dfb, lb, ub):
if lb == ub:
dfb.loc[:, 'isOutlier'] = False
dfb.loc[:, 'indice'] = 1
else:
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
# df = df.astype({'indice': 'int64'})
return dfb
def getitemsid(df, coluna):
a = df[coluna].tolist()
return list(set(a))
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(loc, value, pi)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
Upvotes: 0
Views: 1578
Reputation: 30070
Problem is in your Main_testing.py
while n < len(product_ids):
df_item = df[df[nome_coluna] == item]
df_nf_aux = indice.indice(df_item, lb, ub)
First you slice your df
with condition df[nome_coluna] == item
,this will return a copy of dataframe(You can check this by accessing _is_view
or _is_copy
attribute). Then you pass that filtered dataframe to indice
method.
def indice(dfb, lb, ub):
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
In indice
method, you assign a new column to the filtered dataframe. This is an implicit chained assignment. Pandas don't know if you want to add the new column to the original dataframe or only add to the filtered dataframe, so pandas gives you a warning.
To suppress this warning, you can explicitly tell pandas what you want to do
def indice(dfb, lb, ub):
dfb = dfb.copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
In the above case, I create a copy of filtered dataframe. This means I would like to add the new column to the filtered dataframe not original.
Upvotes: 3