Reputation: 75
I am classifying tweets using single-valued features (e.g. the number of followers of the user) and multiple-valued features (e.g. some long histogram, from LDA or Bag-of-Words for example).
I simply concatenate the features, modelling each component of each multiple-valued feature as a Weka Attribute. I am using SVM and Naive Bayes.
The issue is this: I want to evaluate the attributes with Weka classes, I want to rank the multiple-valued feature as a single attribute (no sense get that BoF_1342 is better then LDA_4103 and BoF_242, and I only want know that BoF is better then LDA).
Does Weka support this kind of evaluation?
Upvotes: 1
Views: 325
Reputation: 174
I'm not sure WEKA supports this kind of aggregation. The solution in this case would be for you to create a script that does it for you (not sure if you can even do that in you scenario) Ex: you have 4 attributes and three instances (BoF_1342,LDA_4103,BoF_242)
BoF_1342 0 - 0 - 0 - 1
LDA_4103 1 - 1 - 0 - 1
BoF_242 0 - 1 - 1 - 0
it would become
BoF 0 - 1 - 1 - 1
LDA 1 - 1 - 0 - 1
Upvotes: 0