Miki Tebeka
Miki Tebeka

Reputation: 13850

How to deal with combination of text and numeric features?

Looking at Kaggel's Job Salary Prediction, I see numeric features (like Category) and textual ones (like FullDescription).

How do I go about training on such data? I thought about vectorizing the text using TfidfTransformer, however it creates sparse matrix which many learning algorithms (such as RandomForestRegressor) refuse to work with. Also, once I have the feature vector for the text, how do I combine it with other features?

Any pointers on how to work with such data?

Thanks!

Upvotes: 6

Views: 1723

Answers (1)

ogrisel
ogrisel

Reputation: 40159

I would first learn a linear model on the tf-idf features of each text field independently and add the linear models predictions as a additional feature to the other features and train an ExtraTreesRegressor or GradientBoostedTreeRegressor on the combined features.

Upvotes: 5

Related Questions