Ezio
Ezio

Reputation: 21

Calculate the maximum price fluctuation in a 24 hour window

I have a data frame that contains two columns - time and price. It contains a series of observations for price of a certain item at various times. Here is a sample.

> df
                  time price
1  2014-12-12 14:57:15 45.81
2  2014-12-12 14:57:15 45.90
3  2014-12-12 15:00:08 45.76
4  2014-12-12 15:00:37 45.72
5  2014-12-12 15:00:49 45.73
6  2014-12-12 15:00:49 45.72
7  2014-12-12 15:00:49 45.76
8  2014-12-12 15:00:49 45.76
9  2014-12-12 15:00:50 45.78
10 2014-12-12 15:00:57 45.76
11 2014-12-12 15:00:57 45.76
12 2014-12-12 15:01:01 45.74
13 2014-12-12 15:01:01 45.74
14 2014-12-12 15:01:08 45.74
15 2014-12-12 15:01:08 45.74
16 2014-12-12 15:01:22 48.79
17 2014-12-12 15:01:23 45.72
18 2014-12-12 15:01:26 45.86
19 2014-12-12 15:01:50 45.72
20 2014-12-12 15:02:00 45.80

At every observation point I want to calculate the maximum price fluctuation in either direction in the next 24 hour window.

> max(df$price - df$price[1])
[1] 2.98
> min(df$price - df$price[1])
[1] -0.09

So for observation 1 in the above example the max fluctuations are 2.98 and -0.09. I could write a function like

 fluc <- function(i) {
     c(max(df$price - df$price[i]), min(df$price - df$price[i]))
 }

and use lapply but this will calculate the differences over the entire data frame. I want to restrict the calculation to next 24 hours so the number of differences to compute will be different for different points of observation.

I could write a n^2 function to do this, but is there a R-friendly way to achieve this? Ideally I would also want the times when the max fluctuation occurs.

Thank you.

Adding dput output:

structure(list(time = structure(c(1418425035.677, 1418425035.677, 
1418425208.407, 1418425237.587, 1418425249.22, 1418425249.22, 
1418425249.38, 1418425249.38, 1418425250.64, 1418425257.97, 1418425257.97, 
1418425261.397, 1418425261.397, 1418425268.333, 1418425268.333, 
1418425282.207, 1418425283.403, 1418425286.083, 1418425310.893, 
1418425320.42), class = c("POSIXct", "POSIXt"), tzone = ""), 
    price = c(45.81, 45.9, 45.76, 45.72, 45.73, 45.72, 45.76, 
    45.76, 45.78, 45.76, 45.76, 45.74, 45.74, 45.74, 45.74, 48.79, 
    45.72, 45.86, 45.72, 45.8)), .Names = c("time", "price"), row.names = c(NA, 
20L), class = "data.frame")

Upvotes: 1

Views: 185

Answers (2)

ARobertson
ARobertson

Reputation: 2897

This seemed to work. I got tripped up by trying to use the date as the bottom window cutoff, but then the duplicates add 0's as a min where there wouldn't be otherwise.

as.POSIXct might not be necessary depending on what format your date is. I also used 60 seconds to make it interesting.

# create upper cutoff for each row
df$cutoff <- as.POSIXct(df$time) + 60 # 24 is 60*60*24

# a for loop works well too
result <- mapply(function(end,rowid,x){
  # create window, and return min/max
  window <- x[as.numeric(row.names(x)) >= rowid & x$time <= end,'price']
  c(min(window - window[1]),max(window - window[1]))
},end = df$cutoff,rowid = 1:nrow(df),MoreArgs = list(x = df[ ,c('time','price')]))

# do whatever with the result
cbind(df,t(result))

Update, includes the time of the max fluctuations:

df$cutoff <- as.POSIXct(df$time) + 60 # 24 is 60*60*24

result <- list()

for(i in 1:(nrow(df)-1)){
  # create window, add diffs, send matching window result
  window <- df[as.numeric(row.names(df)) >= i + 1 & df$time <= df$cutoff[i],c('time','price')]
  window$diffs <- window$price - window$price[1]
  result[[i]] <- (c(i,window[window$diffs == max(window$diffs), ],window[window$diffs == min(window$diffs), ]))
}

# prep data for merging
resultdf <- as.data.frame(do.call('rbind',result))
names(resultdf) <- c('i','maxtime','maxprice','maxdiff','mintime','minprice','mindiff')
df$rowid <- 1:nrow(df)

# merge
merge(df,resultdf,by.x = 'row.names',by.y = 'i',all.x = T,sort = F)

My guess is that *pply functions won't make anything more elegant because the starting row, main data.frame, and the max row are necessary each iteration. Preprocessing and vectorization might help a little more.

Upvotes: 0

incitatus451
incitatus451

Reputation: 163

I guess it will work. I don't think it is the best approach, but I know you want to play with the data.

            myFunc <- function(df, startDate, endDate) {
                df <- df[df$time > startDate & df$time <= endDate, ]
                gain <- as.numeric(NA)
                for(i in 2:nrow(df)) {
                    gain <- c(gain, df$price[i] - df$price[i-1])
                }

                max <- df[which(gain == max(gain, na.rm=TRUE)), ]
                min <- df[which(gain == min(gain, na.rm=TRUE)), ]
                list(max=max, min=min)
            }

            x <- myFunc(df, time[5], time[15])

Upvotes: 1

Related Questions