Reputation: 533
EDIT:
I figured it out!
df_CloseDelta$YearMonth <- as.yearmon(df_CloseDelta$date)
df_CloseDelta %>%
group_by(stock, YearMonth) %>%
summarize(minCloseDelta = min(closeDelta),
meanCloseDelta = mean(closeDelta),
maxCloseDelta = max(closeDelta)) -> df_summary_CloseDelta
I created the following data frame that shows the date, the name of the stock, and the percentage difference between the closing price of each respective stock compared to the day prior.
library(quantmod)
library(dplyr)
library(tidyr)
library(ggplot2)
library(zoo)
start <- as.Date("2014-01-01")
end <- as.Date("2017-12-31")
getSymbols(c("AAPL", "AMZN", "FB", "GOOG", "MSFT"),
from = start, to = end, return.class = "data.frame")
df_wide <- bind_cols(AAPL, AMZN, FB, GOOG, MSFT) %>%
mutate(date = as.Date(rownames(AAPL)))
df_long <- df_wide %>%
gather(key = stock, value = value, - date) %>%
separate(stock, into = c("stock", "type"))
df_panel <- df_long %>%
spread(key = type, value = value)
df_CloseDelta <- df_panel %>%
group_by(stock) %>%
mutate(closeDelta = 100 * (Close - lag(Close))/lag(Close)) %>%
select(date, stock, closeDelta) %>%
filter(!is.na(closeDelta))
I am trying to group these entries by year and month (all of the percentage changes in closing price for AAPL in January 2014, all of the percentage changes in closing price for AMZN in January 2014, etc.) I'm trying to use the yearmon() function to do this, and thought I could create (mutate) a new column where I could extract the year and month and then group the data, or just group it without creating a new column. I was able to do this:
as.yearmon(df_CloseDelta$date)
And return:
[1] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[8] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[15] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[22] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
And so on for every entry.
I then tried to group this:
df_summary_CloseDelta <- df_CloseDelta %>%
group_by(as.yearmon(df_CloseDelta$date))
But received this error:
Error in mutate_impl(.data, dots) :
Column `as.yearmon(df_CloseDelta$date)` must be length 1006 (the group
size) or one, not 5030
I understand that there are 1,006 dates, but that there are 5,030 entries given that there are five stocks. I'm trying to group them, then find out the average, minimum, and maximum per month and year for each respective stock. May someone point me in the right direction?
Upvotes: 4
Views: 5923
Reputation: 269441
xts has to.monthly
which does the conversion to monthly directly so assuming that the input OHLCV data is in a set of xts objects in environment e
as per Note at the end we apply a conversion function to each such object in e
(converting both to monthly, to data frame and appending the symbol) and then rbinding the resulting data frames giving a single data.
sym2df <- function(x, env) cbind(Symbol = x, fortify.zoo(to.monthly(env[[x]], name = "")))
do.call("rbind", lapply(ls(e), sym2df, env = e))
Get the stock data into environment e
:
library(quantmod)
start <- "2014-01-01"
end <- "2017-12-31"
syms <- c("AAPL", "AMZN", "FB", "GOOG", "MSFT")
getSymbols(syms, from = start, to = end, env = e <- new.env())
Upvotes: 0
Reputation: 1904
group_by
is expecting you to give it either variable names, or a vector the same length as the number of rows in your data which will be treated as a factor by which to perform the grouping. See below as an example.
> btest <- data.frame(a = LETTERS[1:10],
+ b = c(1,1,2,2,3,3,4,4,5,5),
+ c = c(rep('e',5), rep('f',5)))
> btest
a b c
1 A 1 e
2 B 1 e
3 C 2 e
4 D 2 e
5 E 3 e
6 F 3 f
7 G 4 f
8 H 4 f
9 I 5 f
10 J 5 f
Now we can calculate the sums for our group of interest in one of two ways. The traditional way is to use group_by
and then our variable c
.
> btest %>%
+ group_by(c) %>%
+ summarise(ex = mean(b))
# A tibble: 2 x 2
c ex
<fct> <dbl>
1 e 1.80
2 f 4.20
However, what your code thinks you're trying to do is provide row-by-row values which it will use to form a grouping.
> btest %>%
+ group_by(c(1,1,1,1,1,2,2,2,2,2)) %>%
+ summarise(ex = mean(b))
# A tibble: 2 x 2
`c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)` ex
<dbl> <dbl>
1 1.00 1.80
2 2.00 4.20
Just so we're clear on what the means should be.
> mean(c(1,1,2,2,3))
[1] 1.8
> mean(c(3,4,4,5,5))
[1] 4.2
Your problem here is that you need to add the column that you want to group by first, then you can group by that.
> df_CloseDelta[['date_yearmon']] <- as.yearmon(df_CloseDelta[['date']])
>
> df_CloseDelta %>%
+ group_by(date_yearmon, stock) %>%
+ summarise(mean_closedelta = mean(closeDelta))
# A tibble: 240 x 3
# Groups: date_yearmon [?]
date_yearmon stock mean_closedelta
<S3: yearmon> <chr> <dbl>
1 Jan 2014 AAPL -0.474
2 Jan 2014 AMZN -0.472
3 Jan 2014 FB 0.746
4 Jan 2014 GOOG 0.310
5 Jan 2014 MSFT 0.104
6 Feb 2014 AAPL 0.269
7 Feb 2014 AMZN 0.0631
8 Feb 2014 FB 0.491
9 Feb 2014 GOOG 0.159
10 Feb 2014 MSFT 0.0713
# ... with 230 more rows
Alternately, if you want to do the whole thing in dplyr
, you could do the following.
df_CloseDelta %>%
mutate(date_yearmon = as.character(as.yearmon(date))) %>%
group_by(date_yearmon, stock) %>%
summarise(mean_closedelta = mean(closeDelta))
Upvotes: 1