Hui Yang ONG
Hui Yang ONG

Reputation: 111

What is df.values[:,1:]?

from sklearn.preprocessing import StandardScaler
X = df.values[:,1:] 
X = np.nan_to_num(X)
Clus_dataSet = StandardScaler().fit_transform(X)
Clus_dataSet

Does anyone understand what is the meaning of this context?

Here is the screenshot!!

Upvotes: 2

Views: 25711

Answers (4)

Nikhil Rana
Nikhil Rana

Reputation: 11

Df here refers to the data frame you are analysing.

In the second line of your code df.Values is used to just return the values and not the indexes of the data frame. Inside the bracket the arguments means that you are loading all the rows of the data frame and ignoring the column at index position 1(which probably is the dependent variable, I assume).

Upvotes: 1

nathankouts
nathankouts

Reputation: 21

As Richie said with X = df.values[:,1:] you basically make X equal to your dataframe but it skips the first column.

X = np.nan_to_num(X) substitutes any NaN values with numerical values.

Clus_dataSet = StandardScaler().fit_transform(X) normalizes the data

Clus_dataSet returns us the dataset.

Be careful because later when you will be plotting your data if you use the X variable you will have to index the data from the second column. X[0] = df[1]

For example: plt.scatter(X[:, 0], X[:, 3], s=area, c=labels.astype(np.float), alpha=0.5)

the X[:, 0] contains the first column of the new variable which previously was df[:, 1] if that makes sense. Kinda hard explaining it.

Upvotes: 2

Kuldip Chaudhari
Kuldip Chaudhari

Reputation: 1112

df.values is gives us dataframe values as numpy array object. df.values[:, 1:] is a way of accessing required values with indexing It means all the rows and all columns except 0th index column in dataframe.

Upvotes: 0

RichieV
RichieV

Reputation: 5183

  • df is a DataFrame with several columns and apparently the target values are on the first column.

  • df.values returns a numpy array with the underlying data of the DataFrame, without any index or columns names.

  • [:, 1:] is a slice of that array, that returns all rows and every column starting from the second column. (the first column is index 0)

Upvotes: 4

Related Questions