CapnShanty
CapnShanty

Reputation: 559

R get the value of one column that's based on several other columns and the max of one?

I see questions like this (Extract the maximum value within each group in a dataframe) for R, but none really get at what I need.

I have data I cleaned to look like this:

      date ticker openprice lowprice closeprice hour min
2015-11-17    EXC     28.53    28.52      28.54    9  31
2015-11-17    EXC     28.53    28.52      28.58    9  32
2015-11-17    EXC     28.57    28.54      28.57    9  33
2015-11-17    AEP     28.59    28.59      28.66    9  34
2015-11-17    AEP     28.66    28.63      28.65    9  35
2015-11-17    AEP     28.64    28.63      28.65    9  36
2015-11-18    EXC     28.53    28.52      28.54    9  31
2015-11-18    EXC     28.53    28.52      28.58    9  32
2015-11-18    EXC     28.57    28.54      28.57    9  33
2015-11-18    AEP     28.59    28.59      28.66    9  34
2015-11-18    AEP     28.66    28.63      28.65    9  35
2015-11-18    AEP     28.64    28.63      28.65    9  36

And from it, I need to get the value of closeprice at the minimum value of min for each hour, ticker, and date, and then again at the maximum value of min for each hour, ticker, and date.

The solution for the above small sample would be something like (not exactly this but so you get the idea of what I'm after):

date, ticker, hour, hour_beginning_price, hour_end_price
2015-11-17, EXC, 9, 28.54, 28.57
2015-11-17, AEP, 9, 28.66, 28.65
2015-11-18, EXC, 9, 29.54, 29.57
2015-11-18, AEP, 9, 29.66, 29.65

and so you can load the sample data in R:

blep<-read.table(header = TRUE, text = '          date ticker openprice lowprice closeprice hour min
    2015-11-17    EXC     28.53    28.52      28.54    9  31
    2015-11-17    EXC     28.53    28.52      28.58    9  32
    2015-11-17    EXC     28.57    28.54      28.57    9  33
    2015-11-17    AEP     28.59    28.59      28.66    9  34
    2015-11-17    AEP     28.66    28.63      28.65    9  35
    2015-11-17    AEP     28.64    28.63      28.65    9  36
    2015-11-18    EXC     28.53    28.52      29.54    9  31
    2015-11-18    EXC     28.53    28.52      29.58    9  32
    2015-11-18    EXC     28.57    28.54      29.57    9  33
    2015-11-18    AEP     28.59    28.59      29.66    9  34
    2015-11-18    AEP     28.66    28.63      29.65    9  35
    2015-11-18    AEP     28.64    28.63      29.65    9  36')

I've been trying to learn how to do this for a while and have nothing, I always try to solve my questions on my own first but I'm not sure how to properly combine the output of aggregate with which, or if that's even the right way to go about this:

which(apply(sd, function(x) all(x == aggregate(sd$min, by = list(sd$date, sd$ticker, sd$hour), max))))

Upvotes: 0

Views: 39

Answers (1)

slightlydeviant
slightlydeviant

Reputation: 44

Here is a solution using dplyr:

library(dplyr)

blep %>% 
  group_by(date, ticker, hour) %>% 
  arrange(date, ticker, hour, min) %>% 
  summarize(hour_beginning_price = first(closeprice),
            hour_end_price = last(closeprice))

The key is to sort by minute (min), so that the first record is the minimum value of minute within each group, and the last value is the maximum value of minute.

Upvotes: 2

Related Questions