Faheem Mitha
Faheem Mitha

Reputation: 6326

Pandas SettingWithCopyWarning for unclear reason

Consider the following example code

import pandas as pd
import numpy as np

pd.set_option('display.expand_frame_repr', False)
foo = pd.read_csv("foo2.csv", skipinitialspace=True, index_col='Index')
foo.loc[:, 'Date'] = pd.to_datetime(foo.Date)

for i in range(0, len(foo)-1):
    if foo.at[i, 'Type'] == 'Reservation':
        for j in range(i+1, len(foo)):
            if foo.at[j, 'Type'] == 'Payout':
                foo.at[j, 'Nights'] = foo.at[i, 'Nights']
                break

mask = (foo['Date'] >= '2018-03-31') & (foo['Date'] <= '2019-03-31')
foo2019 = foo.loc[mask]
foopayouts2019 = foo2019.loc[foo2019['Type'] == 'Payout']
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)

with foo2.csv as:

Index,Date,Type,Nights,Amount,Payout
0,03/07/2018,Reservation,2.0,1000.00,
1,03/07/2018,Payout,,,1000.00
2,09/11/2018,Reservation,3.0,1500.00,
3,09/11/2018,Payout,,,1500.00
4,02/16/2019,Reservation,2.0,2000.00,
5,02/16/2019,Payout,,,2000.00
6,04/25/2019,Reservation,7.0,1200.00,
7,04/25/2019,Payout,,,1200.00

This gives the following warning:

/usr/lib/python2.7/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

The warning does not mention a line number, but appears to be coming from the line:

foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)

At least, if I comment that line out, the error goes away. So, I have two questions.

  1. What is causing that error? I've been trying to use .loc where appropriate, including in that line where the warning is (possibly) coming from. If the problem is actually earlier, where is it?
  2. Second, which is the better choice, .apply or astype, as used in the following lines of code?

    foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
    # foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
    

    It seems that both of them work, except for that warning.

Upvotes: 2

Views: 167

Answers (1)

anky
anky

Reputation: 75080

I would change a few things in the code:

We are checking if the current row is Reservation and the next row is Payout by using shift() and ffill-ing the values where condition matches by using np.where()

foo.Date=pd.to_datetime(foo.Date) #convert to datetime
c=foo.Type.eq('Reservation')&foo.Type.shift(-1).eq('Payout')
foo.Nights=np.where(~c,foo.Nights.ffill(),foo.Nights) #replace if else with np.where

Or:

c=foo.Type.shift().eq('Reservation')&foo.Type.eq('Payout')
np.where(c,foo.Nights.ffill(),foo.Nights)

Then use series.between() to check if dates fall between 2 dates:

foo2019 = foo[foo.Date.between('2018-03-31','2019-03-31')].copy() #changes
foopayouts2019 = foo2019[foo2019['Type'] == 'Payout'].copy() #changes .copy()

Or directly:

foopayouts2019=foo[foo.Date.between('2018-03-31','2019-03-31')&foo.Type.eq('Payout')].copy()

foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64) #.astype(int)

   Index       Date    Type  Nights  Amount  Payout
3      3 2018-09-11  Payout       3     NaN  1500.0
5      5 2019-02-16  Payout       2     NaN  2000.0

Upvotes: 1

Related Questions