Reputation: 51
I want to make new variables that contain information around each individual event for a week: 3 days up to an event and 3 days after. What I have:
df <- Date APP DE10
2014-09-22 0 1.010
2014-09-19 0 1.043
2014-09-18 0 1.081
2014-09-17 0 1.050
2014-09-16 0 1.061
2014-09-15 0 1.067
2014-09-12 1 1.082
2014-09-11 0 1.041
2014-09-10 0 1.047
2014-09-09 0 0.996
2014-09-08 0 0.953
2014-09-05 0 0.928
2014-09-04 1 0.970
2014-09-03 0 0.955
2014-09-02 0 0.931
2014-09-01 0 0.882
What I want my dataset to look like:
APP Indicator DE10_Event1 DE10_Event2
0 1 1.050 0.996
0 2 1.061 0.996
0 3 1.067 0.996
1 4 1.082 0.970
0 5 1.041 0.955
0 6 1.047 0.931
0 7 0.996 0.882
I have been trying different things, but it has not worked out.
Edit
The dataset I want looks as the one above. I need it in order to plot multiple lines in a figure to compare market effects three days prior and after an event. In short, I want a figure like the one below but with more lines, each representing a development up until and after an event. APP=1
indicates when an event takes place, so I need Indicator=4
when APP=1
.
I hope this revision makes sense. Otherwise feel free to ask me again. I really appreciate any help.
Upvotes: 1
Views: 84
Reputation: 51
Thanks for your contributions!
For others who are looking for a solution to a similar problem, I combined the answers I got into:
df$APProws <- 1:nrow(df) #Variable with row numbers
events_rows <- df %>% filter(APP==1) %>% select(APProws) #Indicator for row number for event, APP=1
Subtract the two variables to get row distance from event:
diffs <- data.frame(df %>%
mutate(Event1_DE10=df$APProws-events_rows$APProws[1]) %>%
mutate(Event2_DE10=df$APProws-events_rows$APProws[2]))
Plot the figure:
diffs %>% ggplot() +
geom_line(aes(x=Event1_DE10,y=DE10), color="blue") +
geom_line(aes(x=Event2_DE10,y=DE10), color="red") +
geom_vline(xintercept=0, linetype="dashed") +
scale_x_continuous(limits=c(-3,3)) +
scale_y_continuous(limits=c(0.3,0.7))
Here is the result:
Upvotes: 1
Reputation: 1001
Your question is still not clear, but if I understood correctly, what you need is not to create a new variable, but to select the days around each APP occurrence. I'm posting a new answer, since that is a different question.
Reading your example data:
df <- read.table( text = c('
Date APP DE10
2014-09-22 0 1.010
2014-09-19 0 1.043
2014-09-18 0 1.081
2014-09-17 0 1.050
2014-09-16 0 1.061
2014-09-15 0 1.067
2014-09-12 1 1.082
2014-09-11 0 1.041
2014-09-10 0 1.047
2014-09-09 0 0.996
2014-09-08 0 0.953
2014-09-05 0 0.928
2014-09-04 1 0.970
2014-09-03 0 0.955
2014-09-02 0 0.931
2014-09-01 0 0.882' ),
header = TRUE )
Now identify where you have APPs and get the data around it. There's certainly more elegant ways to do this, but this would do. It will create a new data.frame with all you need for your plots:
# Itentify the rows where APP is 1:
APProws <- as.numeric(rownames( df[ df[,'APP'] == 1, ] ))
# An empty data.frame to receive the data:
APP.df <- data.frame(
Event = rep(NA, length(APProws)*7),
Date = as.Date('2000-12-31'),
DE10 = NA,
Indicator = NA )
n <- 0
for( i in APProws ) {
rows <- (n*7+1):(n*7+7)
APP.df$Event[rows] <- paste('Event', n+1)
APP.df$Date[rows] <- df$Date[(i-3):(i+3)]
APP.df$DE10[rows] <- df$DE10[(i-3):(i+3)]
APP.df$Indicator[n*7+4] <- '1'
n <- n+1
}; rm(i, n, rows)
Now you have everithing you need for your plot.
library(ggplot2)
ggplot(APP.df, aes(Date, DE10)) +
geom_line() +
geom_vline(
data = subset(APP.df, Indicator == 1),
aes(xintercept = as.numeric(Date)),
color = 'red' ) +
facet_grid( ~Event, scales = 'free_x')
If this is what you need, I really recommend that you edit the title of your post, since it's misleading. Describe what you trying to accomplish, not the way you think it's the way to get there.
Upvotes: 1
Reputation: 1001
Reading your example data:
df <- read.table( text = c('
Date APP DE10
2014-09-22 0 1.010
2014-09-19 0 1.043
2014-09-18 0 1.081
2014-09-17 0 1.050
2014-09-16 0 1.061
2014-09-15 0 1.067
2014-09-12 1 1.082
2014-09-11 0 1.041
2014-09-10 0 1.047
2014-09-09 0 0.996
2014-09-08 0 0.953
2014-09-05 0 0.928
2014-09-04 1 0.970
2014-09-03 0 0.955
2014-09-02 0 0.931
2014-09-01 0 0.882' ),
header = TRUE )
Add week day number:
df$Weekday <- strftime(df$Date, '%u')
Now count the number of events per APP and week day, then spread your data. You can try using dplyr/tidyr. I'm accustomed to data.table, so:
library(data.table)
df <- as.data.table(df)
df[ , Event := paste0('DE10_Event', 1:.N) , by = .(APP, Weekday) ]
df.s <- dcast(df, APP + Weekday ~ Event, value.var = 'DE10')
> df.s
APP Weekday DE10_Event1 DE10_Event2 DE10_Event3 DE10_Event4
1: 0 1 1.010 1.067 0.953 0.882
2: 0 2 1.061 0.996 0.931 NA
3: 0 3 1.050 1.047 0.955 NA
4: 0 4 1.081 1.041 NA NA
5: 0 5 1.043 0.928 NA NA
6: 1 4 0.970 NA NA NA
7: 1 5 1.082 NA NA NA
Upvotes: 1