user8493571
user8493571

Reputation: 159

What is the input format for sk-learn classifiers?

I am new to both scikit and numpy/pandas, but I am familiar with Python and data processing in general. I am confused about what format the inputs to sk-learn classifiers should be. I have tried using a debugger to inspect example matrices used in tutorial examples of sk-learn, but they have a huge number of members and I can't figure out which ones are the data and which are derived.

Is there a reference specification somewhere that explains what an array must look like and how to construct it for it to be a valid input for sk-learn classifiers?

Upvotes: 0

Views: 3035

Answers (1)

pythonic833
pythonic833

Reputation: 3224

Sklearn expects your feature matrix X to have the following form:

ind feat1   feat2
0   2   1
1   1   2

You can use either pandas Dataframes or numpy arrays for inputs.

If you are using classified learning then y needs to have as many rows as X.

You can load datasets from sklearn, and check the dimensions and shapes of the matrices because already fit right into problem-related algorithms (in this case it would be a supervised regression problem):

import sklearn.datasets
X,y = sklearn.datasets.load_boston(return_X_y=True)
X.shape[0] == y.shape[0]

Output

True

Upvotes: 5

Related Questions