Reputation: 987
I have a data in which each sample has feature vector consisting of x and about 9000 other features and also corresponding y(target value). in which x and y are both continuous values(between 0 to 20). x a noisy data but we can not recognize the source of the noise. The goal is to predict y from x and other features(features are not noisy). number of samples are about 900,000. what are the machine learning approaches I can use in this problem. also famous networks in neural network or deep learning.
Upvotes: 0
Views: 644
Reputation: 77880
This sounds to me like a standard regression problem, although your prediction correlation is going to suck (technical term :-) ) in direct proportion to the noisiness of x. Look up all the educational examples for predicting housing prices (often used to illustrate gradient descent). You have 9000 features instead of 3 or 4, but that's just a matter of training time.
You might also consider some "factor analysis", so that you can eliminate the features that don't contribute enough to y (correlation coefficient near 0.0). This is called "dimensionality reduction"; look for PCA (Principal Component Analysis).
Upvotes: 1