Reputation: 8297
My task is to drop all rows containing NaNs and encode all the categorical variables inside of data.
I wrote a function that looks like
def preprocess_data(data):
data = data.dropna()
le = LabelEncoder()
data['car name'] = le.fit_transform(data['car name'])
return data
which takes a dataframe and returns a processed data. Running this function gives me a warning that says:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I don't quite get which part of my code is causing this and how to fix it.
Upvotes: 2
Views: 554
Reputation: 419
Make sure you tell pandas that data
it is its own data frame (and not a slice) by using:
def preprocess_data(data):
data = data.dropna().copy()
le = LabelEncoder()
data['car name'] = le.fit_transform(data['car name'])
return data
A more detailed explanation here: https://github.com/pandas-dev/pandas/issues/17476
Upvotes: 1
Reputation: 21998
Maybe you should give more information and / or the problem is not in the method. The following code does not produce warning.
def preprocess_data(data):
data = data.dropna()
le = preprocessing.LabelEncoder()
data['car name'] = le.fit_transform(data['car name'])
return data
preprocess_data(pd.DataFrame({'car name': ['nissan', 'dacia'], 'car mode': ['juke', 'logan']}))
# car mode car name
# 0 juke 1
# 1 logan 0
Upvotes: 0