Reputation: 111
from sklearn.preprocessing import StandardScaler
X = df.values[:,1:]
X = np.nan_to_num(X)
Clus_dataSet = StandardScaler().fit_transform(X)
Clus_dataSet
Does anyone understand what is the meaning of this context?
Upvotes: 2
Views: 25711
Reputation: 11
Df here refers to the data frame you are analysing.
In the second line of your code df.Values is used to just return the values and not the indexes of the data frame. Inside the bracket the arguments means that you are loading all the rows of the data frame and ignoring the column at index position 1(which probably is the dependent variable, I assume).
Upvotes: 1
Reputation: 21
As Richie said with X = df.values[:,1:]
you basically make X equal to your dataframe but it skips the first column.
X = np.nan_to_num(X)
substitutes any NaN values with numerical values.
Clus_dataSet = StandardScaler().fit_transform(X)
normalizes the data
Clus_dataSet
returns us the dataset.
Be careful because later when you will be plotting your data if you use the X variable you will have to index the data from the second column. X[0] = df[1]
For example: plt.scatter(X[:, 0], X[:, 3], s=area, c=labels.astype(np.float), alpha=0.5)
the X[:, 0]
contains the first column of the new variable which previously was df[:, 1]
if that makes sense. Kinda hard explaining it.
Upvotes: 2
Reputation: 1112
df.values is gives us dataframe values as numpy array object. df.values[:, 1:] is a way of accessing required values with indexing It means all the rows and all columns except 0th index column in dataframe.
Upvotes: 0
Reputation: 5183
df
is a DataFrame with several columns and apparently the target values are on the first column.
df.values
returns a numpy array with the underlying data of the DataFrame, without any index or columns names.
[:, 1:]
is a slice of that array, that returns all rows and every column starting from the second column. (the first column is index 0)
Upvotes: 4