Reputation: 537
I am trying to perform a PCA Analysis on Data in a CSV File but I keep getting a weird warning when I attempt to scale the data.
def prepare_data(filename):
df=pd.read_csv(filename,index_col=0)
df.dropna(axis=0,how='any',inplace=True)
return df
def perform_PCA(df):
threshold = 0.3
component = 1 #Second of two right now
pca = decomposition.PCA(n_components=2)
print df.head()
scaled_data = preprocessing.scale(df)
#pca.fit(scaled_data)
#transformed = pca.transform(scaled_data)
#pca_components_df = pd.DataFrame(data = pca.components_,columns = df.columns.values)
This is the warning I keep getting.
C:\Users\mbellissimo\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\utils\validation.py:498: UserWarning: The scale function assumes floating point values as input, got int64
"got %s" % (estimator, X.dtype))
C:\Users\mbellissimo\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\preprocessing\data.py:145: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
Xr -= mean_
C:\Users\mbellissimo\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\preprocessing\data.py:153: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
warnings.warn("Numerical issues were encountered "
C:\Users\mbellissimo\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\preprocessing\data.py:158: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
Xr -= mean_1
C:\Users\mbellissimo\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\preprocessing\data.py:160: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
Xr /= std_
C:\Users\mbellissimo\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\preprocessing\data.py:169: UserWarning: Numerical issues were encountered when scaling the data and might not be solved. The standard deviation of the data is probably very close to 0.
warnings.warn("Numerical issues were encountered "
C:\Users\mbellissimo\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\preprocessing\data.py:174: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
Xr -= mean_2
All the values in the CSV file are numbers. This is what the head looks like
TOOLS/TEST EQUIPMENT WIN PRODUCTIVITY/UTILITY \
HouseholdID
144748819 0 0
144764123 0 0
144765100 0 0
144765495 2 0
144765756 0 2
Can somebody please tell me why I am getting this warning and how I can fix it?
Upvotes: 1
Views: 6131
Reputation: 537
I figured it out. I had to convert my Dataframe into a numpy Matrix and then define the type as float.
numpyMatrix = df.as_matrix().astype(float)
scaled_data = preprocessing.scale(numpyMatrix)
Upvotes: 6