How to filter each column of data in a dataframe by a list of respective outliers

Question

I have a dataset of timestamps with respective values (these are not real, just an example):

Name time NOx ozone PM25 temp rain wind methane benzene xylene
1    80  40    62    3   32   79   36      53       8     91
2    13  72    40   73    1   77   36      85      66     25
3    87  83    93    2   51   28   96      28      66     37
4     6  53     3   96   42   57   75      77      77     50
5    38  61    85   36    2    4   43      17      20     59
6    54  15    75   72   81  100   72      62      86     40
7    36   5    70   44   99    1   62      49      48     66
8    73  97     7   87    7    4   59      97      49     36
9    30  89    58   28   91   72   94      85      68     25
10   12  51    78   17   82   67   52      13      67     60

And I have a separate table of some statistics I performed using "summarise", including the threshold for outliers for each parameter (all columns except "Name" and "time"):

 NOx ozone PM25 temp rain wind methane benzene xylene
53    73   67   26   59   52      45      70     75

I want to create a new table that shows the Name, Time, parameter, and value where any specific value was greater than its outlier threshold. I tried this by making both of these dataframes to the "longer" format using pivot_longer and now my main dataframe looks like this:

Name time parameter value
 1    80 NOx          40
 2    80 ozone        62
 3    80 PM25          3
 4    80 temp         32
 5    80 rain         79
 6    80 wind         36
 7    80 methane      53

And my outliers dataframe looks like this:

 parameter value
 NOx          53
 ozone        73
 PM25         67
 temp         26
 rain         59
 wind         52
 methane      45
 benzene      70
 xylene       75

I want my new table of outliers to look like the above "longer" dataframe with the respective "Name" and "time" with each parameter and value. I tried doing this using "filter":

new_a <- A_long %>%
   group_by(parameter) %>%
   filter(A_long$value > B_long$value)
Error: Result must have length 50, not 450

I did receive an error message trying this but I also don't think this code I tried will give me the table I want. Any advice?

How to filter each column of data in a dataframe by a list of respective outliers

Answers (1)

Related Questions