Reputation: 2745
I am working on classification algorithm and I get different string codes which have some pattern.
|:-----------|------------:|:------------:|
| Column 1 | Column 2 | Column 3 |
|:-----------|------------:|:------------:|
| MN009 | JIK9PO | LEFTu |
| MN010 | JIK9POS | LEFTu |
| MN011 | JIK9POKI | LEFTu |
| MN012 | KIJU | LEFTu |
| MN013 | RANDOM | LEFTu |
| MN014 | FT | LEFTu |
|:-----------|------------:|:------------:|
For column 1 and 3 the feature set can be a vector length 5.
But I do not know how to create feature set which can accommodate column 2 as well.
Considerations:
Hope I am clear with the question. Thanks :)
Upvotes: 1
Views: 1013
Reputation: 854
Have a look at the docs, pack-padded-sequence helps you avoid dynamic graphs and allows the network to disregard padded input. This would be straight forward to implement.
Packs a Variable containing padded sequences of variable length.
Upvotes: 1
Reputation: 3453
There are two solutions:
The one you mentioned; predefine a length, zero-padding sequences that fall short of it. This length can either be set to:
or to a shorter length (information loss ⇒ predictive power penalty). Information loss stems from either ignoring sequences above that length or truncating them and using their cut-down versions.
In both cases you should probably quantify the impact of your choice (i.e. how much information have I discarded from my data by discarding/truncating, or how much larger is my problem space compared to if I used a smaller length).
Upvotes: 2