vp_050
vp_050

Reputation: 508

What method and tool for regression analysis for a multimodal distribution in R?

I have a set of variables X1 and X2 and Y with relationship plot as shown below. X2 values are used for color coding.

X1, X2, and X3 are integer variables.

enter image description here

The observed pattern is multimodal.

What is the best way to predict Y based on X1 and X2?

Can we use non-linear or hurdle models for this?

Also what are the tools available to achieve this in R?

Upvotes: 1

Views: 456

Answers (1)

Robert Long
Robert Long

Reputation: 6897

Generally speaking, there is no need to worry about the distribution of the response. Although you are showing a bivariate plot, it is possible that the multi-modality is explained by X2 (or other, missing variables)

It is the distribution of the model residuals that matters (if it matters at all).

If the residuals are non-normal, then certain inferences may be invalid, although this may not be a problem at all if the model is used for prediction.

If you really do have a curvilinear association then you could consider:

  • transformations
  • non-linear terms
  • splines
  • generalised additive models (GAMs)
  • non-linear models

Of course, if the underlying problem is that you have missing explanatory variables, then some of these approaches may lead to an overfitted model.

Upvotes: 1

Related Questions