Rui
Rui

Reputation: 127

How to convert tree-stuctrued feature to vector feature

I have a data set with several features, one of those features are categorical but have tree structure on its value. For example, if this categorical features have value a, b, c, d, e, f, g, h, I, j, k. then following image reveal the tree relationship of the values: enter image description here

the raw feature do not incorporate this relationship (so that feature only take one column). Now, I want to incorporate this relationship, but I still want the feature be vector form.
my solution for this is: create a binary value column for each node. so in this example, the feature can present by binary vector of length 11. And a feature value equal e can be represented as <1, 1, 0,1,0,0, 0, 0,0,0,0> (shows below)
enter image description here where the 1st element indicate the first level b; 2nd element indicate the second level a; 3rd, 4th, 5th, and 6th element indicate third level d,e,g and j respectively; 7th element indicate second level c; 8th, 9th 10th and 11th element indicate third level f,h,i and k respectively.
The reason I think this would work is you can recover the tree from this vector representation, so I think the information is not lost during this transformation.
The main purpose for this transformation is I want to use some machine learning algorithm on this dataset, so I want the data set be more informative.
I want to know whether this transformation is valid, if not valid, why? And whether there is better way to do this.

Upvotes: 1

Views: 232

Answers (0)

Related Questions