Reputation: 1252
I trying to fetch this datasets using the codes below.
from sklearn.datasets import fetch_20newsgroups
twenty_train = fetch_20newsgroups(subset='train')
However, an error had occurred after this. The program is then killed.
No handlers could be found for logger "sklearn.datasets.twenty_newsgroups"
Killed
I try to load those files manually like this later
twenty_train = load_files('/root/scikit_learn_data/20news_home/20news-bydate-train')
and this
twenty_train = load_files('/root/scikit_learn_data/20news_home/20news-bydate-train',encoding='latin1')
Only the former one works.
Upvotes: 2
Views: 4423
Reputation: 3391
This looks like that scikit-learn is trying to report some error and you have not configured, where your output goes. Even I had the exact same problem when I tried your code. I fixed it by setting up my logger:
import logging
logging.basicConfig()
Now trying to load the dataset gives me the following warning:
WARNING:sklearn.datasets.twenty_newsgroups:Download was incomplete, downloading again.
WARNING:sklearn.datasets.twenty_newsgroups:Downloading dataset from http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz (14 MB)
After the download(14 MB) is completed on your system, you will have the dataset loaded in your twenty_train
variable.
Hope this helps!
Upvotes: 3