Dimitris Poulopoulos
Dimitris Poulopoulos

Reputation: 1159

How to avoid SettingWithCopyWarning in pandas?

I want to convert the type of a column to int using pandas. Here's the source code:

# CustomerID is missing on several rows. Drop these rows and encode customer IDs as Integers.
cleaned_data = retail_data.loc[pd.isnull(retail_data.CustomerID) == False]
cleaned_data['CustomerID'] = cleaned_data.CustomerID.astype(int)

This raises the warning below:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

How can I avoid this warning? Is there a better way to convert the type of CustomerID to int? I'm on python 3.5.

Upvotes: 3

Views: 1594

Answers (1)

Julien Marrec
Julien Marrec

Reputation: 11895

Use it in one loc:

retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)

Example:

import pandas as pd
import numpy as np

retail_data = pd.DataFrame(np.random.rand(4,1)*10, columns=['CustomerID'])
retail_data.iloc[2,0] = np.nan
print(retail_data)

   CustomerID
0    9.872067
1    5.645863
2         NaN
3    9.008643

retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)

       CustomerID
0         9.0
1         5.0
2         NaN
3         9.0

You'll notice that the dtype of the column is still float, because the np.nan cannot be encoded in an int column.

If you really want to drop these rows without changing the underlying retail_data, make an actual copy():

cleaned_data = retail_data.loc[~retail_data.CustomerID.isnull()].copy()

Upvotes: 3

Related Questions