bisamov
bisamov

Reputation: 730

A value is trying to be set on a copy of a slice from a DataFrame. using pandas during the initialization

I am trying to initialize the instance and passing data frame, but for some reason I am getting the output

class TestReg:
    def __init__(self, x, y, create_intercept=False):
        self.x = x
        self.y = y
        if create_intercept:
           self.x['intercept'] = 1

x = data[['class', 'year']]
y = data['performance']
reg = TestReg(x, y, create_intercept=True)

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self.x['intercept'] = 1

Any idea what am I doing wrong ?

Upvotes: 2

Views: 3231

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 148965

You are trying to change values into an extract of a dataframe (a slice in pandas wordings).

After cleaning what you try to do is:

x = data[['class', 'year']]    # x is a slice here
x['intercept'] = 1             # dangerous because behaviour is undefined => warning

Pandas can use either a copy or a view when you use a slice (here 2 columns from a DataFrame). It does not matter when you only read data, but it does if you try to change it, hence the warning.

You should pass the original dataframe and only make changes through it:

class TestReg:
    def __init__(self, data, cols, y, create_intercept=False):
        self.data = data
        self.y = y
        if create_intercept:
           self.data['intercept'] = 1
           cols.append['intercept']
        self.x = data[cols]
...
reg = TestReg(data, ['class', 'year'], y, create_intercept=True)

Alternatively, you could force a copy if you do not want to change the original dataframe:

...
x = data[['class', 'year']].copy()
y = data['performance']
reg = TestReg(x, y, create_intercept=True)

Upvotes: 4

Related Questions