Reputation: 43
I have a tibble that's comprised of time series data. At each time-step, I have a given load (Loadz) and displacement (Dispz) that's associated with the test identification (FName):
# A tibble: 15,926 x 5
# Groups: FName [92]
TIMESTAMP FName Dispz Loadz Failflag
<chr> <chr> <dbl> <dbl> <chr>
1 2020-09-19 14:56:46 D-H1 0 0 ""
2 2020-09-19 14:56:46.5 D-H1 0.019 3.61 ""
3 2020-09-19 14:56:47 D-H1 0.04 8.91 ""
4 2020-09-19 14:56:47.5 D-H1 0.06 14.5 ""
5 2020-09-19 14:56:48 D-H1 0.0790 20.1 ""
6 2020-09-19 14:56:48.5 D-H1 0.101 25.7 ""
7 2020-09-19 14:56:49 D-H1 0.12 31.2 ""
8 2020-09-19 14:56:49.5 D-H1 0.142 36.1 ""
9 2020-09-19 14:56:50 D-H1 0.162 40.9 ""
10 2020-09-19 14:56:50.5 D-H1 0.183 45.9 ""
I have the code below which filters out all loads and displacements below a certain threshold (for displacements greater than 0.02mm and loads exceeding 10N), and then sweeps (or zeroes) everything so that all tests start at the origin.
dat<-df%>%
group_by(FName)%>%
filter(Dispz>0.02)%>%
filter(Loadz>10)%>%
mutate_if(is.numeric,funs(.-first(.)))%>%
slice(1:which.max(Loadz))
However, I have a few tests that have some errors and need to be manually coerced. This is where I'm getting into trouble.
I have the FName identifications of those misbehaving tests and the associated TIMESTAMP value at the location where I need to zero the data, but finding a way to filter out with a look-up dataframe that also looks up the value from this lookup tibble is proving challenging.
# Identify FName ids that need to be manually modified
badlist<-c("WR-H2.2",
"WR-H2.3",
"WR-H2.4")
#Find associated timestamps with each respective FName label
timelist<-c("2020-10-04 12:31:06",
"2020-10-04 12:32:28",
"2020-10-04 12:33:46.5")
#Sample ID, and associated row to
maninput<-tibble(x=badlist,y=timelist)
I'm not particularly good in R, so the sad attempted pseudo-code below is trying to express my intent. Essentially, it needs to look up the FName variable under why it is grouped (and that's my biggest issue, is this using "quo"?), and then find the associated TIMESTAMP value to act as the variable for the filter.
Newdat<-dat%>%
group_by(FName)%>%
filter(TIMESTAMP>maninput[FName== "group_by(FName)",2]
I'm stumped, but feel the answer is pretty simple.
Upvotes: 0
Views: 36
Reputation: 320
This is a prime use case for joins. You can easily solve this by left_join
ing your tibble with the maninput
one, then filtering by comparing with the added column.
You didn't really provide any example data that can be used directly, so I made my own:
library(tidyverse)
library(lubridate)
set.seed(0)
max_time = ymd_hms("2020-11-02 11:07:00 CET")
names = c("D-H1", "WR-H2.2", "WR-H2.3", "WR-H2.4")
obs = 10
df = names %>%
map(~tibble(FName = .x,
Dispz = cumsum(abs(rnorm(obs)) * 0.3),
Loadz = cumsum(abs(rnorm(obs)) * 10),
TIMESTAMP = seq(max_time - obs, max_time - 1, length.out = obs),
Failflag = "")) %>%
bind_rows()
dat = df %>%
group_by(FName) %>%
filter(Dispz > 0.02) %>%
filter(Loadz > 10) %>%
mutate(across(where(is.numeric), ~.x - first(.x))) %>%
slice(1:which.max(Loadz)) %>%
ungroup()
badlist = c("WR-H2.2", "WR-H2.3")
timelist = c("2020-11-02 11:06:54 CET", "2020-11-02 11:06:57 CET")
maninput = tibble(x = badlist, y = ymd_hms(timelist))
Now you can do the join:
newdat = dat %>%
group_by(FName) %>%
left_join(maninput, by = c("FName" = "x")) %>%
filter(is.na(y) | TIMESTAMP > y) %>%
select(-y)
We first join on the maninput
table as explained, creating an y
column for every test where the FName
matches an x
value in maninput
. Then we filter, selecting only the observations where either the y
column is NA
(i.e. there is no matching test in maninput
), or the TIMESTAMP
is greater than the y
column (i.e. the observation occurred after the specified time in maninput
). Then we remove the y
column from the data as it is no longer needed.
This results in a tibble where the selected tests only have data after their respective timestamps, which I'm pretty sure is what you were looking for:
# A tibble: 23 x 5
# Groups: FName [4]
FName Dispz Loadz TIMESTAMP Failflag
<chr> <dbl> <dbl> <dttm> <chr>
1 D-H1 0 0 2020-11-02 11:06:51 ""
2 D-H1 0.399 11.5 2020-11-02 11:06:52 ""
(...)
9 D-H1 2.46 49.7 2020-11-02 11:06:59 ""
10 WR-H2.2 0.409 25.3 2020-11-02 11:06:55 ""
11 WR-H2.2 0.735 35.2 2020-11-02 11:06:56 ""
12 WR-H2.2 0.942 39.5 2020-11-02 11:06:57 ""
13 WR-H2.2 1.33 51.9 2020-11-02 11:06:58 ""
14 WR-H2.2 1.34 54.7 2020-11-02 11:06:59 ""
15 WR-H2.3 1.98 41.2 2020-11-02 11:06:58 ""
16 WR-H2.3 2.05 53.8 2020-11-02 11:06:59 ""
17 WR-H2.4 0 0 2020-11-02 11:06:53 ""
18 WR-H2.4 0.244 6.64 2020-11-02 11:06:54 ""
(...)
23 WR-H2.4 0.949 43.8 2020-11-02 11:06:59 ""
Upvotes: 1