Reputation: 25
I've been trying my best, but not quite getting there. I'm trying to iterate through the value in a vector (df$sample) and find the first proceeding incidence of a value that is 20% less than the current value. I am trying to find this for each row (sample) and print the date of the found value to a new column.
Here's my df:
date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
...
My shotty attempts have been to use Position() or which(). I thought maybe I could wrap either of them in a for loop, but my attempts are not quite right.
for(i in length(df){
df$conc20 <- Position(function(x) x < df$sample[i]*0.80, df$sample)
}
or
for(i in length(df){
df$conc20 <- min(which(df$sample < df$sample[i]*0.8)
}
I even found a dply example that got close to what I was looking for.
Ideally:
date sample conc20
591 2020-02-14 0.008470 2020-02-25
590 2020-02-15 0.008460 ...
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
...
Any clarification I'm happy to provide. I really do appreciate the help!
Upvotes: 1
Views: 456
Reputation: 42564
If I understand correctly, this can be solved by a non-equi self join using two helper columns:
library(data.table)
setDT(df)[, rn := .I][, threshold := 0.8 * sample][
, conc20 := df[df, on = .(rn > rn, sample < threshold), mult = "first", x.date]][
, c("rn", "threshold") := NULL][]
date sample conc20 1: 2020-02-14 0.008470 2020-02-20 2: 2020-02-15 0.008460 2020-02-20 3: 2020-02-16 0.007681 2020-02-27 4: 2020-02-17 0.007144 2020-02-27 5: 2020-02-18 0.007262 2020-02-27 6: 2020-02-19 0.007300 2020-02-27 7: 2020-02-20 0.006604 <NA> 8: 2020-02-21 0.006843 2020-02-27 9: 2020-02-22 0.006687 2020-02-27 10: 2020-02-23 0.006991 2020-02-27 11: 2020-02-24 0.007333 2020-02-27 12: 2020-02-25 0.006738 2020-02-27 13: 2020-02-26 0.006279 <NA> 14: 2020-02-27 0.005300 <NA>
The first condition in the on =
clause ensures that only succeeding rows are considered, the second condition looks for sample < threshold
where threshold
has been defined beforehand as 80% of sample
. The helper column rn
contains row numbers (created via the data.table special symbol .I
). In addition, mult = "first"
tells to pick the first occurrence in case of multiple matches.
The result is appended as additional column conc20
by reference, i.e., without copying the whole dataset. Finally, the two helper columns are removed by reference.
Note that data.table chaining is used.
For demonstration, the result of the non-equi self join including all helper columns can be shown:
setDT(df)[, rn := .I][, threshold := 0.8 * sample][
df, on = .(rn > rn, sample < threshold), mult = "first"]
date sample rn threshold i.date i.sample 1: 2020-02-20 0.0067760 1 0.0052832 2020-02-14 0.008470 2: 2020-02-20 0.0067680 2 0.0052832 2020-02-15 0.008460 3: 2020-02-27 0.0061448 3 0.0042400 2020-02-16 0.007681 4: 2020-02-27 0.0057152 4 0.0042400 2020-02-17 0.007144 5: 2020-02-27 0.0058096 5 0.0042400 2020-02-18 0.007262 6: 2020-02-27 0.0058400 6 0.0042400 2020-02-19 0.007300 7: <NA> 0.0052832 7 NA 2020-02-20 0.006604 8: 2020-02-27 0.0054744 8 0.0042400 2020-02-21 0.006843 9: 2020-02-27 0.0053496 9 0.0042400 2020-02-22 0.006687 10: 2020-02-27 0.0055928 10 0.0042400 2020-02-23 0.006991 11: 2020-02-27 0.0058664 11 0.0042400 2020-02-24 0.007333 12: 2020-02-27 0.0053904 12 0.0042400 2020-02-25 0.006738 13: <NA> 0.0050232 13 NA 2020-02-26 0.006279 14: <NA> 0.0042400 14 NA 2020-02-27 0.005300
library(data.table)
df <- fread("
i date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
580 2020-02-27 0.005300
", drop = 1L)
Upvotes: 1
Reputation: 319
Quite messy, but this should do the trick
library(dplyr)
df<- read.csv( sep = " ", text=
"row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279",
)
x <- 1.05
df <- df %>%
mutate(id = 1:n()) %>%
rowwise %>%
mutate(greater_row =
first(which(sample*x <
df$sample[id:nrow(df)]) +
id-1))
df$greater_row <- df$date[df$greater_row]
This should allow you to set x
to any factor you want you want
Upvotes: 1
Reputation: 2140
Edited Answer
df<- read.csv( sep = " ", text=
"row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279",
)
df$date=as.Date(as.character(df$date))
df
#there is no row 20% below, so I am just using 2% below
# and multiplying 0.98 instead of 0.8
# Finding cross-over before current row
f_crossover_before<- function( i ){
cutoff= 0.98* df$sample[i]
res<- max(which( df$sample[1:i]<= cutoff), -1)
ifelse ( (res>0) , res , NA ) # sapply cannot return dates !
}
# Finding cross-over after current row
f_crossover_after<- function( i ){
cutoff<- 0.98* df$sample[i]
res<- min( i+which( df$sample[(i+1):nrow(df)]<= cutoff),
.Machine$integer.max )
ifelse ( (res<.Machine$integer.max) , res , NA )
}
# A column for comparison. Only for visual inspection
df$cutoff<- df$sample*0.98
df$crossover_before<- sapply( seq_along(df$sample) , FUN = f_crossover_before )
df$crossover_before<- df$date[df$crossover_before]
df$crossover_after<- sapply( seq_along(df$sample) , FUN = f_crossover_after)
df$crossover_after<- df$date[df$crossover_after]
#View(df)
Output :
# row date sample cutoff crossover_before crossover_after
#1 591 2020-02-14 0.008470 0.00830060 <NA> 2020-02-16
#2 590 2020-02-15 0.008460 0.00829080 <NA> 2020-02-16
#3 589 2020-02-16 0.007681 0.00752738 <NA> 2020-02-17
#4 588 2020-02-17 0.007144 0.00700112 <NA> 2020-02-20
#5 587 2020-02-18 0.007262 0.00711676 <NA> 2020-02-20
#6 586 2020-02-19 0.007300 0.00715400 2020-02-17 2020-02-20
#7 585 2020-02-20 0.006604 0.00647192 <NA> 2020-02-26
#8 584 2020-02-21 0.006843 0.00670614 2020-02-20 2020-02-22
#9 583 2020-02-22 0.006687 0.00655326 <NA> 2020-02-26
#10 582 2020-02-23 0.006991 0.00685118 2020-02-22 2020-02-25
#11 581 2020-02-24 0.007333 0.00718634 2020-02-23 2020-02-25
#12 580 2020-02-25 0.006738 0.00660324 <NA> 2020-02-26
#13 579 2020-02-26 0.006279 0.00615342 <NA> <NA>
Upvotes: 1