Reputation: 87
I have a dataframe that I have done the following on:
PC2_filter <-samples %>%
select(predicted, PC2) %>%
filter(predicted == 'EUR') %>%
summarise(mean = mean(PC2), sd=sd(PC2), sd2=2*sd(PC2), sd3=3*sd(PC2))
This is to get both 2 standard deviations and 3 standard deviations from the mean. I have also done this for another variable (PC1):
PC1_filter <-samples %>%
select(predicted, PC2) %>%
filter(predicted == 'EUR') %>%
summarise(mean = mean(PC1), sd=sd(PC1), sd2=2*sd(PC1), sd3=3*sd(PC1))
So the resulting table will look something like this:
mean sd sd2 sd3
9.24 1.73 3.47 5.21
I want to then use these dataframes to select values that are within this range from my main data. For example if sd3 = 3 and the mean was 0, I would like to select values that are within the range -3 to +3.
I hope that makes sense. An example data set is below:
sample_id predicted PC1 PC2
A EUR 2 3
B EUR 5 7
C EUR 4 6
D USA -3 4
E EUR 12 3
F EUR 2 -10
G EUR -2 5
H EUR 4 2
I have tried the following but it seems to not be selecting all the samples that fall into the desired range:
selected <- filter(samples, !PC1 >= (PC1_filter$mean +- PC1_filter$sd3) & !PC2 >= (PC2_filter$mean +- PC2_filter$sd3))
Any help would be greatly appreciated as I am relatively new to R.
Let me know if you need better examples. Thanks
Upvotes: 0
Views: 121
Reputation: 160437
+-
is not proper R syntax, not for what you want it to be.
Perhaps something like:
samples %>%
filter(
between(PC1, PC1_filter$mean - PC1_filter$sd3, PC1_filter$mean + PC1_filter$sd3),
between(PC2, PC2_filter$mean - PC2_filter$sd3, PC2_filter$mean + PC2_filter$sd3)
)
Upvotes: 2