Reputation: 3267
I've never used python before and I find myself in the dire need of using sklearn module in my node.js project for machine learning purposes.
I have been all day trying to understand the code examples in said module and now that I kind of understand how they work, I don't know how to use my own data set.
Each of the built in data sets has its own function (load_iris
, load_wine
, load_breast_cancer
, etc) and they all load data from a .csv and an .rst file. I can't find a function that will allow me to load my own data set. (there's a load_data
function but it seems to be for internal use of the previous three I mentioned, cause I can't import
it)
How could I do that? What's the proper way to use sklearn with any other data set? Does it always have to be a .csv file? Could it be programmatically provided data (array, object, etc)?
In case it's important: all those built-in data sets have numeric features, my data set has both numeric and string features to be used in the decision tree.
Thanks
Upvotes: 1
Views: 630
Reputation: 33147
You can load whatever you want and then use sklearn
models.
If you have a .csv
file, pandas
would be the best option.
import pandas as pd
mydataset = pd.read_csv("dataset.csv")
X = mydataset.values[:,0:10] # let's assume that the first 10 columns are the features/variables
y = mydataset.values[:,11] # let's assume that the 11th column has the target values/classes
...
sklearn_model.fit(X,y)
Similarily, you can load .txt
or .xls
files.
The important thing in order to use sklearn models is this:
X
should be always be an 2D array with shape [n_samples, n_variables]y
should be the target varible.Upvotes: 2