edwin
edwin

Reputation: 1252

Unable to fetch 20 NewsGroups dataset in Scikit-Learn

I trying to fetch this datasets using the codes below.

from sklearn.datasets import fetch_20newsgroups
twenty_train = fetch_20newsgroups(subset='train')

However, an error had occurred after this. The program is then killed.

No handlers could be found for logger "sklearn.datasets.twenty_newsgroups"
Killed

I try to load those files manually like this later

twenty_train = load_files('/root/scikit_learn_data/20news_home/20news-bydate-train')

and this

twenty_train = load_files('/root/scikit_learn_data/20news_home/20news-bydate-train',encoding='latin1')

Only the former one works.

Upvotes: 2

Views: 4423

Answers (1)

Abhinav Arora
Abhinav Arora

Reputation: 3391

This looks like that scikit-learn is trying to report some error and you have not configured, where your output goes. Even I had the exact same problem when I tried your code. I fixed it by setting up my logger:

import logging
logging.basicConfig()

Now trying to load the dataset gives me the following warning:

WARNING:sklearn.datasets.twenty_newsgroups:Download was incomplete, downloading again.
WARNING:sklearn.datasets.twenty_newsgroups:Downloading dataset from http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz (14 MB)

After the download(14 MB) is completed on your system, you will have the dataset loaded in your twenty_train variable.

Hope this helps!

Upvotes: 3

Related Questions