Reputation: 7182
I'm wondering how to match the labels produced by a SVN classifier with the ones on my dataset. ANd then I realized that the problem starts at the begining: when I load the dataset I got a dataset which in my case has the following properties:
.data = the news text
.target_names = label used in the dataset e.g. ["positive", "negative"]
.target = A matrix with a number for each news with a label.
But I,m wondering if the order og the target_names is different across different datasets (with the sametags but different news), and if the order of the .data elements influences that.
Is there any way to easily know the label of a number in the .target matrix? (I mean, what does 0 or 1 represents in such a matrix)
Best,
Upvotes: 5
Views: 1304
Reputation: 1449
The corresponding label for an entry i
in .target
is available as .target_names[i]
. In your example: .target_names[1]
is "negative".
The order of the target names will be the same across different datasets, as long as the tags are exactly the same. This is because sklearn.datasets.load_files()
creates the tags from the sorted folder names, as we can see in the source code (v.20.x):
[...]
folders = [f for f in sorted(listdir(container_path))
if isdir(join(container_path, f))]
if categories is not None:
folders = [f for f in folders if f in categories]
for label, folder in enumerate(folders):
target_names.append(folder)
[...]
I'd still suggest to always retrieve the label from target_names
of the current dataset to be on the safe side (implementations may change over time etc.)
Upvotes: 6