Klakteuh
Klakteuh

Reputation: 31

R: Filter for rows where a value in another row matches a condition

I have a large dataframe of financial data in long format with over 1000 assets and I'm trying to analyze situations where the returns of one stock is affecting the returns of other stocks. I want to filter a plot to see how the values for one asset are, on the day that the value for another asset is X.

I've created a mock dataframe to illustrate my issue:

library(dplyr)
library(ggplot2)

the_date <- c('01-01-1990', '02-01-1990', '03-01-1990', '04-01-1990', '05-01-1990', '01-01-1990', '02-01-1990', '01-01-1990', '02-01-1990','03-01-1990')
the_asset <- c('AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'MSFT', 'MSFT','AMZN', 'AMZN', 'AMZN')
the_price <- as.numeric(c(5,6,4,7,8,12,14,50,48,62))
the_returns <- as.numeric(c(0.1, -0.2, 0.14, 0.01, 0.05, -0.002, -0.11, 0.07, 0.08, 0.22))

test_df1 <- data.frame(the_date, the_asset, the_price, the_returns)

test_df1 <- test_df1 %>%
  group_by(the_asset) %>%
  mutate(quartile = ntile(the_returns, n=4))

I then plot the returns by quartile in ggplot:

test_df1 %>%
  group_by(quartile) %>%
  ggplot(aes(x = quartile, y = the_returns)) +
  geom_bar(stat = 'identity') +
  ggtitle('Returns by Quartile')

I would like to filter this plot in the following manner:


I've thought of making the dataframe into wide format to get each asset as a separate column however I'm not sure if that is the best option or how to proceed exactly.

Upvotes: 0

Views: 106

Answers (2)

ViviG
ViviG

Reputation: 1726

You can use SQL to approach that (I am sure you can do the same with dplyr, but I am better at sql...).

# Add a column with the count of dates that are the same
library(sqldf) # you might have to install other packages to make this one work

test_df1<-sqldf("SELECT count(*) OVER (PARTITION BY the_date) AS SAME_DATE, *  
      FROM test_df1 ")

And the plot:

ggplot(test_df1,
       aes(
         x = SAME_DATE,
         y = the_returns
         
       
       )) +
  geom_bar(stat = 'identity', aes(fill =the_asset)) 
  ggtitle('Returns by Quartile')

enter image description here

EDIT: Perhaps this is better:

ggplot(test_df1,
       aes(
         x = SAME_DATE,
         y = the_returns
         
       )) +
  geom_bar(stat = 'identity', alpha = 0.5) +
  geom_jitter(aes(shape = the_asset, color = the_asset),size = 4)
ggtitle('Returns by Date')

enter image description here

Upvotes: 1

Elena
Elena

Reputation: 155

This should work, but on the small dataset it does not return a nice plot, as there are too few rows left after filtering to the conditions you mentioned:

test_df1 %>%
  spread(the_asset,quartile, fill=0) %>%
  filter(AMZN==1) %>% 
  ggplot(aes(x = as.factor(AAPL), y = the_returns)) +
  geom_bar(stat = 'identity') +
  ggtitle('Returns by Quartile')

Upvotes: 1

Related Questions