Reputation: 67
I am going to do some work for transition-based dependency parsing using LIBLINEAR. But I am confused how to utilize it. As follows:
I set 3 feature templates for my training&testing processes of transition-based dependency parsing:
1. the word in the top of the stack
2. the word in the front of the queue
3. information from the current tree formed with the steps
And the feature defined in LIBLINEAR is:
FeatureNode(int index, double value)
Some examples like:
LABEL ATTR1 ATTR2 ATTR3 ATTR4 ATTR5
----- ----- ----- ----- ----- -----
1 0 0.1 0.2 0 0
2 0 0.1 0.3 -1.2 0
1 0.4 0 0 0 0
2 0 0.1 0 1.4 0.5
3 -0.1 -0.2 0.1 1.1 0.1
But I want to define my features like(one sentence 'I love you') at some stage:
feature template 1: the word is 'love'
feature template 2: the word is 'you'
feature template 3: the information is - the left son of 'love' is 'I'
Does it mean I must define features with LIBLINEAR like: -------FORMAT 1 (indexes in vocabulary: 0-I, 1-love, 2-you)
LABEL ATTR1(template1) ATTR2(template2) ATTR3(template3)
----- ----- ----- -----
SHIFT 1 2 0
(or LEFT-arc,
RIGHT-arc)
But I have go thought some statements of others, I seem to define feature in binary so I have to define a words vector like: ('I', 'love', 'you'), when 'you' appears for example, the vector will be (0, 0, 1)
So the features in LIBLINEAR may be: -------FORMAT 2
LABEL ATTR1('I') ATTR2('love') ATTR3('love')
----- ----- ----- -----
SHIFT 0 1 0 ->denoting the feature template 1
(or LEFT-arc,
RIGHT-arc)
SHIFT 0 0 1 ->denoting the feature template 2
(or LEFT-arc,
RIGHT-arc)
SHIFT 1 0 0 ->denoting the feature template 3
(or LEFT-arc,
RIGHT-arc)
Which is correct between FORMAT 1 and 2?
Is there some something I have mistaken?
Upvotes: 1
Views: 78
Reputation: 390
Basically you have a feature vector of the form:
LABEL RESULT_OF_FEATURE_TEMPLATE_1 RESULT_OF_FEATURE_TEMPLATE_2 RESULT_OF_FEATURE_TEMPLATE_3
Liblinear or LibSVM expect you to translate it into integer representation:
1 1:1 2:1 3:1
Nowadays, depending on the language you use there are lots of packages/libraries, which would translate the string vector into libsvm format automatically, without you having to know the details.
However, if for whatever reason you want to do it yourself, the easiest thing would be maintain two mappings: one mapping for labels ('shift' -> 1, 'left-arc' -> 2, 'right-arc' -> 3, 'reduce' -> 4). And one for your feature template result ('f1=I' -> 1, 'f2=love' -> 2, 'f3=you' -> 3). Basically every time your algorithms applies a feature template you check whether the result is already in the mapping and if not you add it with a new index.
Remember that Liblinear or Libsvm expect a sorted list in ascending order.
During processing you would first apply your feature templates to the current state of your stacks and then translate the strings to the libsvm/liblinear integer representation and sort the indexes in ascending order.
Upvotes: 1