rfairy
rfairy

Reputation: 51

Creating new variables by subsetting data

I want to make new variables that contain information around each individual event for a week: 3 days up to an event and 3 days after. What I have:

df <- Date   APP  DE10 
  2014-09-22 0    1.010 
  2014-09-19 0    1.043
  2014-09-18 0    1.081
  2014-09-17 0    1.050
  2014-09-16 0    1.061
  2014-09-15 0    1.067
  2014-09-12 1    1.082
  2014-09-11 0    1.041
  2014-09-10 0    1.047
  2014-09-09 0    0.996
  2014-09-08 0    0.953
  2014-09-05 0    0.928
  2014-09-04 1    0.970
  2014-09-03 0    0.955
  2014-09-02 0    0.931
  2014-09-01 0    0.882

What I want my dataset to look like:

 APP      Indicator  DE10_Event1 DE10_Event2
 0        1          1.050      0.996
 0        2          1.061      0.996
 0        3          1.067      0.996
 1        4          1.082      0.970
 0        5          1.041      0.955
 0        6          1.047      0.931
 0        7          0.996      0.882

I have been trying different things, but it has not worked out.

Edit

The dataset I want looks as the one above. I need it in order to plot multiple lines in a figure to compare market effects three days prior and after an event. In short, I want a figure like the one below but with more lines, each representing a development up until and after an event. APP=1 indicates when an event takes place, so I need Indicator=4 when APP=1.

I hope this revision makes sense. Otherwise feel free to ask me again. I really appreciate any help.

enter image description here

Upvotes: 1

Views: 84

Answers (3)

rfairy
rfairy

Reputation: 51

Thanks for your contributions!

For others who are looking for a solution to a similar problem, I combined the answers I got into:

df$APProws <- 1:nrow(df) #Variable with row numbers
events_rows <- df %>% filter(APP==1) %>% select(APProws) #Indicator for row number for event, APP=1

Subtract the two variables to get row distance from event:

diffs <- data.frame(df %>% 
    mutate(Event1_DE10=df$APProws-events_rows$APProws[1]) %>% 
    mutate(Event2_DE10=df$APProws-events_rows$APProws[2])) 

Plot the figure:

diffs %>% ggplot() + 
  geom_line(aes(x=Event1_DE10,y=DE10), color="blue") +
  geom_line(aes(x=Event2_DE10,y=DE10), color="red") + 
  geom_vline(xintercept=0, linetype="dashed")  +
  scale_x_continuous(limits=c(-3,3)) +
  scale_y_continuous(limits=c(0.3,0.7))

Here is the result:

Bond yield development

Upvotes: 1

Carlos Eduardo Lagosta
Carlos Eduardo Lagosta

Reputation: 1001

Your question is still not clear, but if I understood correctly, what you need is not to create a new variable, but to select the days around each APP occurrence. I'm posting a new answer, since that is a different question.

Reading your example data:

df <- read.table( text = c('
  Date   APP  DE10 
  2014-09-22 0    1.010 
  2014-09-19 0    1.043
  2014-09-18 0    1.081
  2014-09-17 0    1.050
  2014-09-16 0    1.061
  2014-09-15 0    1.067
  2014-09-12 1    1.082
  2014-09-11 0    1.041
  2014-09-10 0    1.047
  2014-09-09 0    0.996
  2014-09-08 0    0.953
  2014-09-05 0    0.928
  2014-09-04 1    0.970
  2014-09-03 0    0.955
  2014-09-02 0    0.931
  2014-09-01 0    0.882' ),
  header = TRUE )

Now identify where you have APPs and get the data around it. There's certainly more elegant ways to do this, but this would do. It will create a new data.frame with all you need for your plots:

# Itentify the rows where APP is 1:
APProws <- as.numeric(rownames( df[ df[,'APP'] == 1, ] ))

# An empty data.frame to receive the data:
APP.df <- data.frame(
  Event = rep(NA, length(APProws)*7),
  Date = as.Date('2000-12-31'),
  DE10 = NA,
  Indicator = NA )

n <- 0
for( i in APProws ) {

  rows <- (n*7+1):(n*7+7)

  APP.df$Event[rows] <- paste('Event', n+1)
  APP.df$Date[rows] <- df$Date[(i-3):(i+3)]
  APP.df$DE10[rows] <- df$DE10[(i-3):(i+3)]
  APP.df$Indicator[n*7+4] <- '1'

  n <- n+1
}; rm(i, n, rows)

Now you have everithing you need for your plot.

library(ggplot2)

ggplot(APP.df, aes(Date, DE10)) +
  geom_line() +
  geom_vline(
    data = subset(APP.df, Indicator == 1),
    aes(xintercept = as.numeric(Date)),
    color = 'red' ) +
  facet_grid( ~Event, scales = 'free_x')

enter image description here

If this is what you need, I really recommend that you edit the title of your post, since it's misleading. Describe what you trying to accomplish, not the way you think it's the way to get there.

Upvotes: 1

Carlos Eduardo Lagosta
Carlos Eduardo Lagosta

Reputation: 1001

Reading your example data:

df <- read.table( text = c('
  Date   APP  DE10 
  2014-09-22 0    1.010 
  2014-09-19 0    1.043
  2014-09-18 0    1.081
  2014-09-17 0    1.050
  2014-09-16 0    1.061
  2014-09-15 0    1.067
  2014-09-12 1    1.082
  2014-09-11 0    1.041
  2014-09-10 0    1.047
  2014-09-09 0    0.996
  2014-09-08 0    0.953
  2014-09-05 0    0.928
  2014-09-04 1    0.970
  2014-09-03 0    0.955
  2014-09-02 0    0.931
  2014-09-01 0    0.882' ),
  header = TRUE )

Add week day number:

df$Weekday <- strftime(df$Date, '%u')

Now count the number of events per APP and week day, then spread your data. You can try using dplyr/tidyr. I'm accustomed to data.table, so:

library(data.table)

df <- as.data.table(df)

df[ , Event := paste0('DE10_Event', 1:.N) , by = .(APP, Weekday) ]

df.s <- dcast(df, APP + Weekday ~ Event, value.var = 'DE10')

> df.s
   APP Weekday DE10_Event1 DE10_Event2 DE10_Event3 DE10_Event4
1:   0       1       1.010       1.067       0.953       0.882
2:   0       2       1.061       0.996       0.931          NA
3:   0       3       1.050       1.047       0.955          NA
4:   0       4       1.081       1.041          NA          NA
5:   0       5       1.043       0.928          NA          NA
6:   1       4       0.970          NA          NA          NA
7:   1       5       1.082          NA          NA          NA

Upvotes: 1

Related Questions