Reputation: 31
I have a large dataframe of financial data in long format with over 1000 assets and I'm trying to analyze situations where the returns of one stock is affecting the returns of other stocks. I want to filter a plot to see how the values for one asset are, on the day that the value for another asset is X.
I've created a mock dataframe to illustrate my issue:
library(dplyr)
library(ggplot2)
the_date <- c('01-01-1990', '02-01-1990', '03-01-1990', '04-01-1990', '05-01-1990', '01-01-1990', '02-01-1990', '01-01-1990', '02-01-1990','03-01-1990')
the_asset <- c('AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'MSFT', 'MSFT','AMZN', 'AMZN', 'AMZN')
the_price <- as.numeric(c(5,6,4,7,8,12,14,50,48,62))
the_returns <- as.numeric(c(0.1, -0.2, 0.14, 0.01, 0.05, -0.002, -0.11, 0.07, 0.08, 0.22))
test_df1 <- data.frame(the_date, the_asset, the_price, the_returns)
test_df1 <- test_df1 %>%
group_by(the_asset) %>%
mutate(quartile = ntile(the_returns, n=4))
I then plot the returns by quartile in ggplot:
test_df1 %>%
group_by(quartile) %>%
ggplot(aes(x = quartile, y = the_returns)) +
geom_bar(stat = 'identity') +
ggtitle('Returns by Quartile')
I would like to filter this plot in the following manner:
I've thought of making the dataframe into wide format to get each asset as a separate column however I'm not sure if that is the best option or how to proceed exactly.
Upvotes: 0
Views: 106
Reputation: 1726
You can use SQL to approach that (I am sure you can do the same with dplyr, but I am better at sql...).
# Add a column with the count of dates that are the same
library(sqldf) # you might have to install other packages to make this one work
test_df1<-sqldf("SELECT count(*) OVER (PARTITION BY the_date) AS SAME_DATE, *
FROM test_df1 ")
And the plot:
ggplot(test_df1,
aes(
x = SAME_DATE,
y = the_returns
)) +
geom_bar(stat = 'identity', aes(fill =the_asset))
ggtitle('Returns by Quartile')
EDIT: Perhaps this is better:
ggplot(test_df1,
aes(
x = SAME_DATE,
y = the_returns
)) +
geom_bar(stat = 'identity', alpha = 0.5) +
geom_jitter(aes(shape = the_asset, color = the_asset),size = 4)
ggtitle('Returns by Date')
Upvotes: 1
Reputation: 155
This should work, but on the small dataset it does not return a nice plot, as there are too few rows left after filtering to the conditions you mentioned:
test_df1 %>%
spread(the_asset,quartile, fill=0) %>%
filter(AMZN==1) %>%
ggplot(aes(x = as.factor(AAPL), y = the_returns)) +
geom_bar(stat = 'identity') +
ggtitle('Returns by Quartile')
Upvotes: 1