Dawn17
Dawn17

Reputation: 8297

SettingWithCopyWarning in Pandas DataFrame using Python

My task is to drop all rows containing NaNs and encode all the categorical variables inside of data.

I wrote a function that looks like

def preprocess_data(data):

    data = data.dropna()
    le = LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])

    return data

which takes a dataframe and returns a processed data. Running this function gives me a warning that says:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I don't quite get which part of my code is causing this and how to fix it.

Upvotes: 2

Views: 554

Answers (2)

Make sure you tell pandas that data it is its own data frame (and not a slice) by using:

def preprocess_data(data):

    data = data.dropna().copy()
    le = LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])

    return data

A more detailed explanation here: https://github.com/pandas-dev/pandas/issues/17476

Upvotes: 1

Romain
Romain

Reputation: 21998

Maybe you should give more information and / or the problem is not in the method. The following code does not produce warning.

def preprocess_data(data):

    data = data.dropna()
    le = preprocessing.LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])
    return data


preprocess_data(pd.DataFrame({'car name': ['nissan', 'dacia'], 'car mode': ['juke', 'logan']}))

#   car mode  car name
# 0     juke         1
# 1    logan         0

Upvotes: 0

Related Questions