Reputation: 21
I have a dataset of timestamps with respective values (these are not real, just an example):
Name time NOx ozone PM25 temp rain wind methane benzene xylene
1 80 40 62 3 32 79 36 53 8 91
2 13 72 40 73 1 77 36 85 66 25
3 87 83 93 2 51 28 96 28 66 37
4 6 53 3 96 42 57 75 77 77 50
5 38 61 85 36 2 4 43 17 20 59
6 54 15 75 72 81 100 72 62 86 40
7 36 5 70 44 99 1 62 49 48 66
8 73 97 7 87 7 4 59 97 49 36
9 30 89 58 28 91 72 94 85 68 25
10 12 51 78 17 82 67 52 13 67 60
And I have a separate table of some statistics I performed using "summarise", including the threshold for outliers for each parameter (all columns except "Name" and "time"):
NOx ozone PM25 temp rain wind methane benzene xylene
53 73 67 26 59 52 45 70 75
I want to create a new table that shows the Name, Time, parameter, and value where any specific value was greater than its outlier threshold. I tried this by making both of these dataframes to the "longer" format using pivot_longer and now my main dataframe looks like this:
Name time parameter value
1 80 NOx 40
2 80 ozone 62
3 80 PM25 3
4 80 temp 32
5 80 rain 79
6 80 wind 36
7 80 methane 53
And my outliers dataframe looks like this:
parameter value
NOx 53
ozone 73
PM25 67
temp 26
rain 59
wind 52
methane 45
benzene 70
xylene 75
I want my new table of outliers to look like the above "longer" dataframe with the respective "Name" and "time" with each parameter and value. I tried doing this using "filter":
new_a <- A_long %>%
group_by(parameter) %>%
filter(A_long$value > B_long$value)
Error: Result must have length 50, not 450
I did receive an error message trying this but I also don't think this code I tried will give me the table I want. Any advice?
Upvotes: 1
Views: 57
Reputation: 16998
Suppose your original dataframe is named df
and your outliers are stored in df_outl
.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols=-c("Name", "time"), names_to="parameter") %>%
left_join(df_outl, by="parameter") %>%
filter(value.x >= value.y) %>%
select(Name, time, parameter, value:=value.x, -value.y)
which returns
# A tibble: 43 x 4
Name time parameter value
<dbl> <dbl> <chr> <dbl>
1 1 80 temp 32
2 1 80 rain 79
3 1 80 methane 53
4 1 80 xylene 91
5 2 13 NOx 72
6 2 13 PM25 73
7 2 13 rain 77
8 2 13 methane 85
9 3 87 NOx 83
10 3 87 ozone 93
Upvotes: 1