Reputation: 11
I have a set of multi valued features which are linked together. As an example,
ItemCodes | Scores |
---|---|
AK, NA, UY | 0.6, 0.2, 0.2 |
KG, AK | 0.5, 0.5 |
Each Item has a corresponding score associated with it. Some rows might not have any items/scores. I want to convert this dataset (there are other numerical features) into a form that can be fed to an Neural Net. The data comes from a different part of the system with its own API.
I was trying to create a vector of item codes (binary, if the item is present or not) and a second vector with score values at the corresponding indices. If I only had Item codes, I could do a multi-hot encoding and get a feature vector of items. So far I am using
values = tft.compute_and_apply_vocabulary(itemcodes)
to get the indices, which I can then set to 1 in the output Tensor. But, if item AK is allotted index j in the multi-hot Tensor, how do I ensure 0.6 is also set at index j in the second Tensor? Because there can be missing values, the ItemCodes are Scores are available as a SparseTensor and so I am unable to iterate directly. How can I achieve what I want? Or is there a better way to represent such features?
Upvotes: 1
Views: 44
Reputation: 326
I think the fundamental question to answer is what does each row represent? If ItemCodes are static, you can create a lookup table, and place the associated score at the right index. If you provide more code/details, I can likely help you with this.
Upvotes: 0