Reputation: 508

How to find variance and count periods until condition is met after event in R

--sorry for the bad title, any suggestions to make it more clear?--

I have the following data frame:

df <- data.frame( day = c(1,2,3,4,5,6,7,8,9,10,11),
                  score = c(67,51,52,57,66,63,63,68,64,57,77),
                  attempt = c(0,1,0,1,0,0,0,1,0,0,0))

I want to calculate how many days it takes to get over a threshold of >10% in scores each time an attempt occurs. The threshold should only be calculated for scores that occurred after the attempt.

The threshold is simply calculated as the percentage difference from the attempts until the next score which is >10%. For the first value in the table below this would be 57/51-1=0.12

Attempt = 1

As a result, I would like a table which shows me the actual percentage deviation over the score at the time the attempt as well as the number of days it took.

Day Score  Attempt  Threshold   Periods
1    67       0     
2    51       1        12%         1
3    52       0     
4    57       1        16%         0
5    66       0     
6    63       0     
7    63       0     
8    68       1        13%         2
9    64       0     
10   57       0     
11   77       0

Upvotes: 1

Answers (1)

KenHBS

Reputation: 7174

If you don't have your threshold yet, you can calculate it as follows. I'll assume you have some starting point start_score:

start_score  <- 45
later_scores <- df$score[df$attempt == 1]

target       <- c(start_score, later_scores)
# 45 51 57 68      From these we want to calculate percentage increase:    

# -length(target) to remove the last value of target from the denominator
pct_increase <- (diff(target) / target[-length(target)]) * 100

df$threshold[df$attempt == 1] <- pct_increase

With the threshold column in place, we can proceed: Find the rows in which df$threshold > 10, then precede those indices with a zero and calculate the number of rows (i.e. periods) that are between the rows with df$threshold > 10:

inds <- c(0, which(df$threshold > 10))

df$periods       <- rep(NA, 11)
df$periods[inds] <- diff(inds)-1

# day score attempt threshold    periods
# 1    67       0        NA      NA
# 2    51       1  13.33333       1
# 3    52       0        NA      NA
# 4    57       1  11.76471       1
# 5    66       0        NA      NA
# 6    63       0        NA      NA
# 7    63       0        NA      NA
# 8    68       1  19.29825       3
# 9    64       0        NA      NA
#10    57       0        NA      NA
#11    77       0        NA      NA

Edit to find sell point 'attempt': The first value in 'df$score' is troublesome for illustration, because if you buy at that point, you'll only be able to sell at +10% in the last period. However, if you have already bought, you should've sold in the first period immediately. Therefore I deleted this value from the data frame:

df <- data.frame(day = c(2,3,4,5,6,7,8,9,10,11),
                 score = c(51,52,57,66,63,63,68,64,57,77),
                 attempt = c(1, rep(NA, 9)))

If I understand correctly, you will sell after a score is reached that is 10% higher than the score in the period in which you bought. Parallel to this, you also immediately buy again in the period you sell, right? So you wait with selling your newly bought shares (?) until the scores have again risen by 10%:

sell_time1 <- 1
repeat{  
  sell_thres <- df$score[sell_time1] * 1.1
  sell_time2 <- min( which( (df$score > sell_thres) & (df$day > df$day[sell_time1]) ))
  ifelse(sell_time2 == sell_time1, break, sell_time1 <- sell_time2)
  df$attempt[sell_time1] <- 1
}

This will produce a warning, because at some point sell_time2 in the second line of the repeat{} will try to take the minimum value of an empty vector. In this application, this is nothing to worry about. It will result in:

#    day  score attempt
#    2    51       1
#    3    52      NA
#    4    57       1
#    5    66       1
#    6    63      NA
#    7    63      NA
#    8    68      NA
#    9    64      NA
#   10    57      NA
#   11    77       1

Upvotes: 1

How to find variance and count periods until condition is met after event in R

Answers (1)

Related Questions