Reputation: 95
I am trying to subset a large data matrix, an example of which is below:
row 1/col 1 row 1/col 2 row 1/col 3
[1,] 855.815 749.574 754.950
[2,] 855.718 749.496 755.004
[3,] 855.846 749.359 754.910
[4,] 855.746 749.299 754.795
[5,] 855.805 749.421 754.883
I am trying to remove columns where the value of the first row is above or below one standard deviation away from the mean of the whole first row, using this code:
library(matrixStats)
x = data[,-1] > (rowMeans(data[,-1]) + rowSds(data[,-1]))
y = data[,-1] < (rowMeans(data[,-1]) - rowSds(data[,-1]))
subset(df2, !(x | y))
But this returns the following error when applied to my dataset:
Error in x[subset & !is.na(subset), vars, drop = drop] :
(subscript) logical subscript too long
As I understand it, R has expanded this to read:
subset(df2, !(data[,-1] > (rowMeans(data[,-1]) + rowSds(data[,-1]))|data[,-1] < (rowMeans(data[,-1]) - rowSds(data[,-1]))))
and that the logical argument is simply too long. Is there something I am missing? I am inexperienced with R and sure there are neater ways to do this, but from what I have read I thought subset would be most useful.
Thank you in advance.
Upvotes: 1
Views: 119
Reputation: 23109
You can try this:
df <- as.matrix(read.table(text='C1 C2 C3
[1,] 855.815 749.574 754.950
[2,] 855.718 749.496 755.004
[3,] 855.846 749.359 754.910
[4,] 855.746 749.299 754.795
[5,] 855.805 749.421 754.883', header=TRUE))
library(matrixStats)
df[,which(abs(df[1,] - rowMeans(df)[1]) < rowSds(df)[1])]
# C2 C3
#[1,] 749.574 754.950
#[2,] 749.496 755.004
#[3,] 749.359 754.910
#[4,] 749.299 754.795
#[5,] 749.421 754.883
Upvotes: 1