user6818
user6818

Reputation: 105

How does one call external datasets into scikit-learn?

For example consider this dataset:

(1) https://archive.ics.uci.edu/ml/machine-learning-databases/annealing/anneal.data

Or

(2) http://data.worldbank.org/topic

How does one call such external datasets into scikit-learn to do anything with it?


The only kind of dataset calling that I have seen in scikit-learn is through a command like:

from sklearn.datasets import load_digits

digits = load_digits()

Upvotes: 0

Views: 972

Answers (2)

Davis Anunda
Davis Anunda

Reputation: 39

Simply run the following command and replace the name 'EXTERNALDATASETNAME' with the name of your dataset

import sklearn.datasets 
data = sklearn.datasets.fetch_EXTERNALDATASETNAME()

Upvotes: 0

Matthew Drury
Matthew Drury

Reputation: 1095

You need to learn a little pandas, which is a data frame implementation in python. Then you can do

import pandas
my_data_frame = pandas.read_csv("/path/to/my/data")

To create model matrices from your data frame, I recommend the patsy library, which implements a model specification language, similar to R formulas

import patsy
model_frame = patsy.dmatrix("my_response ~ my_model_fomula", my_data_frame)

then the model frame can be passed in as an X into the various sklearn models.

Upvotes: 1

Related Questions