Multicollinearity#

Multicolinearity

Cause#

  1. Structural: e.g. \(x \ \&\ x^2\), interaction term and the main effect included in the interaction term

  2. Data: by nature two variables are correlatd

Eeffect#

  • Inference of covariate

    • \(\beta\) becomes sensitive to small change in the model

    • imprecise (i.e. large s.e.) of \(\beta\rightarrow\) large p-value, low power

  • NO effect on goodness-of-fit nor on model prediction

Diagnosis#

  • VIF: variance inflation factor of variable \(x_i\), higher the worse (usually < 5 is moderate), is defined as:

    \[\text{VIF}_i = \frac{1}{1-R_i^2}\]

    where \(R_i^2\) is the coefficient of determination when regression \(X_i\) on the rest of the indepdent variables.

Treatment ​#

  1. Structural: standardize indepndent variable, e.g. \(X' = X - \bar{X}\)

  2. Variables that are not of interest: no need to worry

  3. Variables that are of interest:

    1. remove some of the highly correlated variables

    2. Linearly combine the variables, e.g. adding them together

    3. PCA or PLSR (partial least squares regression, ?)