Reputation: 357
I have a data frame with 60 variables and all variables have missing values in a way that none of the lines are complete:
complete.cases(data)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[28] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[55] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[82] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[136] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
So I guess I cannot use linear regression to impute missing values. Any idea how I can handle them so that I can perform my linear regression?
Upvotes: 0
Views: 1668
Reputation: 992
Handling missing values may include:
It's the best to omit variables for which most observations are missing.
Omitting the rows/observations/cases with missing values. This strategy is referenced as listwise deletion or complete case analysis. This is a possibility, if the type of missingness is MCAR (Missing Completely At Random), and there are still large enough sample after the deletions.
Different imputation techniques: mean/median/mode substitution, regression inputation, expectation-maximization (EM), multiple inputation, etc.
Upvotes: 1