Reputation: 447
I have a dataframe with the structure bellow:
W01 0.750000 0.916667 0.642857 1.000000 0.619565
W02 0.880000 0.944444 0.500000 0.991228 0.675439
W03 0.729167 0.900000 0.444444 1.000000 0.611111
W04 0.809524 0.869565 0.500000 1.000000 0.709091
W05 0.625000 0.925926 0.653846 1.000000 0.589286
Variation 1_941119_A/G 1_942335_C/G 1_942451_T/C 1_942934_G/C \
W01 0.967391 0.965909 1 0.130435
W02 0.929825 0.937500 1 0.184211
W03 0.925926 0.880000 1 0.138889
W04 0.918182 0.907407 1 0.200000
W05 0.901786 0.858491 1 0.178571
Variation 1_944296_G/A ... X_155545046_C/T X_155774775_G/T \
W01 0.978261 ... 0.652174 0.641304
W02 0.938596 ... 0.728070 0.736842
W03 0.944444 ... 0.675926 0.685185
W04 0.927273 ... 0.800000 0.690909
W05 0.901786 ... 0.794643 0.705357
Variation Y_5100327_G/T Y_5100614_T/G Y_12786160_G/A Y_12914512_C/A \
W01 0.807692 0.800000 0.730769 0.807692
W02 0.655172 0.653846 0.551724 0.666667
W03 0.880000 0.909091 0.833333 0.916667
W04 0.666667 0.642857 0.580645 0.678571
W05 0.730769 0.720000 0.692308 0.720000
Variation Y_13470103_G/A Y_19705901_A/G Y_20587967_A/C mean_age
W01 0.807692 0.666667 0.333333 56.3
W02 0.678571 0.520000 0.250000 66.3
W03 0.916667 0.764706 0.291667 69.7
W04 0.666667 0.560000 0.322581 71.6
W05 0.703704 0.600000 0.346154 72.5
[5 rows x 67000 columns]
I am trying to fit a robust regression using MM-estimator and gather summary statistics of the fit (p-value and the slope) using the snippet bellow:
> df %>% gather(snp, value, -mean_age) %>%
+ nest(-snp) %>%
+ mutate(model = map(data, ~rlm(mean_age ~ value, data = ., method="MM", psi=psi.bisquare, maxit=50)),
+ summary = map(model, glance)) %>%
+ dplyr::select(-data, -model) %>%
+ unnest(summary) -> linear_regression_results
This however throws the well-known rlm singular error:
Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, :
'x' is singular: singular fits are not implemented in 'rlm'
I was wondering if theres any suggestion as to how to resolve this error?
Upvotes: 0
Views: 1069
Reputation: 447
This problem is occasionally due to duplicate measurements in the variables. As it is clear from the data-frame above for column 1_942451_T/C
there are duplicate values. A simple and ad hoc solution to this problem is to jitter values:
jittered_DF <- data.frame(lapply(df, jitter))
or
r_DF <- data.frame(lapply(df, rnorm))
Perhaps it would be more precise if jitter()
method could only be applied to those columns with duplicate values, and not to the whole data-frame.
Upvotes: 1