Reputation: 1159
I want to convert the type of a column to int using pandas. Here's the source code:
# CustomerID is missing on several rows. Drop these rows and encode customer IDs as Integers.
cleaned_data = retail_data.loc[pd.isnull(retail_data.CustomerID) == False]
cleaned_data['CustomerID'] = cleaned_data.CustomerID.astype(int)
This raises the warning below:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
How can I avoid this warning? Is there a better way to convert the type of CustomerID to int? I'm on python 3.5.
Upvotes: 3
Views: 1594
Reputation: 11895
Use it in one loc
:
retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)
Example:
import pandas as pd
import numpy as np
retail_data = pd.DataFrame(np.random.rand(4,1)*10, columns=['CustomerID'])
retail_data.iloc[2,0] = np.nan
print(retail_data)
CustomerID
0 9.872067
1 5.645863
2 NaN
3 9.008643
retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'] = retail_data.loc[~retail_data.CustomerID.isnull(),'CustomerID'].astype(int)
CustomerID
0 9.0
1 5.0
2 NaN
3 9.0
You'll notice that the dtype of the column is still float, because the np.nan
cannot be encoded in an int
column.
If you really want to drop these rows without changing the underlying retail_data, make an actual copy()
:
cleaned_data = retail_data.loc[~retail_data.CustomerID.isnull()].copy()
Upvotes: 3