Reputation: 565
> matrix(c(c(0, 3.75882e-06, 3.71645e-05, 2.16088e-06, 1.357e-06, 1.19274e-06, NaN, 1.14748e-06, 9.3314e-07), c(3.75882e-06, 0, 3.94165e-05, 3.58464e-06, 3.60392e-06, 3.43881e-06, NaN, 3.39315e-06, 3.17616e-06), c(3.71645e-05, 3.94165e-05, 0, 3.78173e-05, 3.70121e-05, 3.68449e-05, NaN, 3.6798e-05, 3.65591e-05), c(2.16088e-06, 3.58464e-06, 3.78173e-05, 0, 2.00581e-06, 1.84085e-06, NaN, 1.79527e-06, 1.57976e-06), c(1.357e-06, 3.60392e-06, 3.70121e-05, 2.00581e-06, 0, 1.03709e-06, NaN, 9.91615e-07, 7.77135e-07), c(1.19274e-06, 3.43881e-06, 3.68449e-05, 1.84085e-06, 1.03709e-06, 0, NaN, 8.27333e-07, 6.12979e-07), c(NaN, NaN, NaN, NaN, NaN, NaN, 0, NaN, NaN), c(1.14748e-06, 3.39315e-06, 3.6798e-05, 1.79527e-06, 9.91615e-07, 8.27333e-07, NaN, 0, 5.67856e-07), c(9.3314e-07, 3.17616e-06, 3.65591e-05, 1.57976e-06, 7.77135e-07, 6.12979e-07, NaN, 5.67856e-07, 0)), ncol=9)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 0.00000e+00 3.75882e-06 3.71645e-05 2.16088e-06 1.35700e-06 1.19274e-06 NaN 1.14748e-06 9.33140e-07
[2,] 3.75882e-06 0.00000e+00 3.94165e-05 3.58464e-06 3.60392e-06 3.43881e-06 NaN 3.39315e-06 3.17616e-06
[3,] 3.71645e-05 3.94165e-05 0.00000e+00 3.78173e-05 3.70121e-05 3.68449e-05 NaN 3.67980e-05 3.65591e-05
[4,] 2.16088e-06 3.58464e-06 3.78173e-05 0.00000e+00 2.00581e-06 1.84085e-06 NaN 1.79527e-06 1.57976e-06
[5,] 1.35700e-06 3.60392e-06 3.70121e-05 2.00581e-06 0.00000e+00 1.03709e-06 NaN 9.91615e-07 7.77135e-07
[6,] 1.19274e-06 3.43881e-06 3.68449e-05 1.84085e-06 1.03709e-06 0.00000e+00 NaN 8.27333e-07 6.12979e-07
[7,] NaN NaN NaN NaN NaN NaN 0 NaN NaN
[8,] 1.14748e-06 3.39315e-06 3.67980e-05 1.79527e-06 9.91615e-07 8.27333e-07 NaN 0.00000e+00 5.67856e-07
[9,] 9.33140e-07 3.17616e-06 3.65591e-05 1.57976e-06 7.77135e-07 6.12979e-07 NaN 5.67856e-07 0.00000e+00
I have a bunch of matrices of the above kind. They are filled with numeric elements except on certain rows and columns that are made up of NaNs. On the intersection between the rows and columns made of NaNs there is always a zero. Note that on the above example there is only one row and one column that contains NaN but in reality I may have several such rows and columns.
I am aiming to write a function that removes automatically the rows and columns that are almost made up of NaNs. How can I achieve this?
Upvotes: 2
Views: 88
Reputation: 263382
Logical indexing with rowSums and colSums (in the right locations) gives a very compact and efficient answer:
M[rowSums(is.na(M)) < 0.8*nrow(M), ][ , colSums(is.na(M))< 0.8*ncol(M)]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.00000e+00 3.75882e-06 3.71645e-05 2.16088e-06 1.35700e-06
[2,] 3.75882e-06 0.00000e+00 3.94165e-05 3.58464e-06 3.60392e-06
[3,] 3.71645e-05 3.94165e-05 0.00000e+00 3.78173e-05 3.70121e-05
[4,] 2.16088e-06 3.58464e-06 3.78173e-05 0.00000e+00 2.00581e-06
[5,] 1.35700e-06 3.60392e-06 3.70121e-05 2.00581e-06 0.00000e+00
[6,] 1.19274e-06 3.43881e-06 3.68449e-05 1.84085e-06 1.03709e-06
[7,] 1.14748e-06 3.39315e-06 3.67980e-05 1.79527e-06 9.91615e-07
[8,] 9.33140e-07 3.17616e-06 3.65591e-05 1.57976e-06 7.77135e-07
[,6] [,7] [,8]
[1,] 1.19274e-06 1.14748e-06 9.33140e-07
[2,] 3.43881e-06 3.39315e-06 3.17616e-06
[3,] 3.68449e-05 3.67980e-05 3.65591e-05
[4,] 1.84085e-06 1.79527e-06 1.57976e-06
[5,] 1.03709e-06 9.91615e-07 7.77135e-07
[6,] 0.00000e+00 8.27333e-07 6.12979e-07
[7,] 8.27333e-07 0.00000e+00 5.67856e-07
[8,] 6.12979e-07 5.67856e-07 0.00000e+00
Can even do it in one step:
M[rowSums(is.na(M)) < 0.8*nrow(M), colSums(is.na(M))< 0.8*ncol(M)]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.00000e+00 3.75882e-06 3.71645e-05 2.16088e-06 1.35700e-06
[2,] 3.75882e-06 0.00000e+00 3.94165e-05 3.58464e-06 3.60392e-06
[3,] 3.71645e-05 3.94165e-05 0.00000e+00 3.78173e-05 3.70121e-05
[4,] 2.16088e-06 3.58464e-06 3.78173e-05 0.00000e+00 2.00581e-06
[5,] 1.35700e-06 3.60392e-06 3.70121e-05 2.00581e-06 0.00000e+00
[6,] 1.19274e-06 3.43881e-06 3.68449e-05 1.84085e-06 1.03709e-06
[7,] 1.14748e-06 3.39315e-06 3.67980e-05 1.79527e-06 9.91615e-07
[8,] 9.33140e-07 3.17616e-06 3.65591e-05 1.57976e-06 7.77135e-07
[,6] [,7] [,8]
[1,] 1.19274e-06 1.14748e-06 9.33140e-07
[2,] 3.43881e-06 3.39315e-06 3.17616e-06
[3,] 3.68449e-05 3.67980e-05 3.65591e-05
[4,] 1.84085e-06 1.79527e-06 1.57976e-06
[5,] 1.03709e-06 9.91615e-07 7.77135e-07
[6,] 0.00000e+00 8.27333e-07 6.12979e-07
[7,] 8.27333e-07 0.00000e+00 5.67856e-07
[8,] 6.12979e-07 5.67856e-07 0.00000e+00
And if you were sure that there were only one fewer than the number of rows or columns then the logical tests could be <= (nrow(M)-1)
and <= (ncol(M)-1)
Upvotes: 5