matt cooper
matt cooper

Reputation: 101

Python warning confusion

I have the following code which correlates data brought in from PgSQL.

if wd is not None and dd is not None:
    alldata=np.concatenate((wd,dd))
    alldat_df=pd.DataFrame(alldata, index=None, columns=['datetime','rain', 'raindiff'])

    alldat_df.drop(alldat_df.loc[2708:2738].index, inplace=True)
    alldata=np.array(alldat_df)
    alldata[0,2]=0
    mask = (alldat_df['datetime'] > fdate) & (alldat_df['datetime'] <= tdate)
    ndf=alldat_df.loc[mask]
    ndf.loc[0,['raindiff']]=0
    ndf.index=ndf['datetime']
    ndf.drop(columns=['datetime'], inplace=True)
    davisdfnew=ndf.resample(bs, offset=bs, origin=fdate).sum()
    davisdfnew.rename(columns={'rain':'rain sum','raindiff':'raindiff sum'}, inplace=True)
   

if dd is None:
    alldat_df=pd.DataFrame(wd, index=None, columns=['datetime', 'rain', 'raindiff'])
    mask = (alldat_df['datetime'] > fdate) & (alldat_df['datetime'] <= tdate)
    ndf=alldat_df.loc[mask]
    ndf.loc[0,['raindiff']]=0
    ndf.index=ndf['datetime']
    ndf.drop(columns=['datetime'], inplace=True)
    davisdfnew=ndf.resample(bs, offset=bs, origin=fdate).sum()
    davisdfnew.rename(columns={'rain':'rain sum','raindiff':'raindiff sum'}, inplace=True)
    

if wd is None:
    alldat_df=pd.DataFrame(dd, index=None, columns=['datetime', 'rain', 'raindiff'])
    mask = (alldat_df['datetime'] > fdate) & (alldat_df['datetime'] <= tdate)
    ndf=alldat_df.loc[mask]
    ndf.loc[0,['raindiff']]=0
    ndf.index=ndf['datetime']
    ndf.drop(columns=['datetime'], inplace=True)
    davisdfnew=ndf.resample(bs, offset=bs, origin=fdate).sum()
    davisdfnew.rename(columns={'rain':'rain sum','raindiff':'raindiff sum'}, inplace=True)

When it runs and the first two if conditions are met it throws the following warning

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ndf.loc[0,['raindiff']]=0

but when the condition if wd is None is met there is no warning in all cases the value at ndf.loc[0,['raindif']] is a none type object

I would appreciate it if someone could shed some light on this!

edited as per @john giorgio comment

wd=

array([[datetime.datetime(2021, 5, 20, 10, 45), 0.0, None],
       [datetime.datetime(2021, 5, 20, 11, 0), 0.0, 0.0],
       [datetime.datetime(2021, 5, 20, 11, 15), 0.0, 0.0],
       ...,
       [datetime.datetime(2021, 6, 17, 22, 30), 96.6, 0.0],
       [datetime.datetime(2021, 6, 17, 22, 45), 96.6, 0.0],
       [datetime.datetime(2021, 6, 17, 23, 0), 96.6, 0.0]], dtype=object)

dd=

array([[datetime.datetime(2021, 6, 17, 15, 30, 42), 96.6, None],
       [datetime.datetime(2021, 6, 17, 15, 35, 42), 96.6, 0.0],
       [datetime.datetime(2021, 6, 17, 15, 40, 42), 96.6, 0.0],
       ...,
       [datetime.datetime(2021, 6, 30, 23, 45, 41), 113.8, 0.0],
       [datetime.datetime(2021, 6, 30, 23, 50, 41), 113.8, 0.0],
       [datetime.datetime(2021, 6, 30, 23, 55, 41), 113.8, 0.0]],
      dtype=object)

as I said, the error occurs when wd exists. If both wd and dd exist they are combined, and duplicate datetimes removed to give ndf. if only wd exists ndf is formed from it, in both these cases the error occurs.

If only dd exists ndf is formed from this, and the error does not occur

Upvotes: 0

Views: 53

Answers (1)

John Giorgio
John Giorgio

Reputation: 659

What you could try to do is resetting the index, .reset_index(drop=True) each time you take a subsample of your original dataset, before performing any other action.

Upvotes: 1

Related Questions