hdjc90
hdjc90

Reputation: 87

Filter + - values and plot R

I have a dataframe that I have done the following on:

PC2_filter <-samples %>%
  select(predicted, PC2) %>%
  filter(predicted == 'EUR') %>%
  summarise(mean = mean(PC2), sd=sd(PC2), sd2=2*sd(PC2), sd3=3*sd(PC2))

This is to get both 2 standard deviations and 3 standard deviations from the mean. I have also done this for another variable (PC1):

PC1_filter <-samples %>%
  select(predicted, PC2) %>%
  filter(predicted == 'EUR') %>%
  summarise(mean = mean(PC1), sd=sd(PC1), sd2=2*sd(PC1), sd3=3*sd(PC1))

So the resulting table will look something like this:

mean   sd     sd2     sd3
9.24  1.73    3.47   5.21

I want to then use these dataframes to select values that are within this range from my main data. For example if sd3 = 3 and the mean was 0, I would like to select values that are within the range -3 to +3.

I hope that makes sense. An example data set is below:

sample_id   predicted   PC1   PC2
A           EUR         2     3  
B           EUR         5     7
C           EUR         4     6
D           USA         -3     4
E           EUR         12     3
F           EUR         2     -10
G           EUR         -2     5
H           EUR         4     2

I have tried the following but it seems to not be selecting all the samples that fall into the desired range:

selected <- filter(samples, !PC1 >= (PC1_filter$mean +- PC1_filter$sd3) & !PC2 >= (PC2_filter$mean +- PC2_filter$sd3))

Any help would be greatly appreciated as I am relatively new to R.

Let me know if you need better examples. Thanks

Upvotes: 0

Views: 121

Answers (1)

r2evans
r2evans

Reputation: 160437

+- is not proper R syntax, not for what you want it to be.

Perhaps something like:

samples %>%
  filter(
    between(PC1, PC1_filter$mean - PC1_filter$sd3, PC1_filter$mean + PC1_filter$sd3),
    between(PC2, PC2_filter$mean - PC2_filter$sd3, PC2_filter$mean + PC2_filter$sd3)
  )

Upvotes: 2

Related Questions