Reputation: 4361
I have a folder called 'emails' with two subfolders named after the label corresponding to the classification of files they have (spam or notspam emails, all are .txt files). There are 3000 files across the two subfolders. Using load_files:
data = load_files('emails', shuffle='False')
print len(data)
print len(data.target)
This prints '5' and then '3000'. How can the length of data only be 5 if it found 3000 classification labels?
Upvotes: 1
Views: 2344
Reputation: 1060
Your data is stored in data.data
and target in data.target
.
Try print(len(data.data))
instead.
load_files()
simply returns a sklearn.datasets.base.Bunch
, which is a simple data wrapper.
So, data
is in this format:
{
'DESCR': None,
'data': [],
'filenames': array(),
'target': array(),
'target_names': []
}
This is why len(data)
returns 5.
Hope this helps!
Upvotes: 3