Reputation: 47
I have the following dataset in which I need to construct a loop to calculate the differences in time from one row referencing the prior row. Once the difference is calculated I want to place that value in a new column called "Difference". The main part that I am stuck on is writing a loop to calculate the differences from when a trade is closed to when it was opened. This is denoted by buy/sell (opened) to (t/p or s/l) (closed). I have a large data frame so I would want to apply the loop across the entire data frame.
I have generated a sample data frame of what I would like the loop to do. Any suggestions would be extremely helpful.
This is my desired df: Trade Data Example
This is the current structure of my data:
structure(list(ID = 1:10, Year = c(2005L, 2005L, 2005L, 2005L,
2005L, 2005L, 2005L, 2005L, 2005L, 2005L), Month = c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Day = c(7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L), Time = c("4:00", "5:30", "6:00", "8:20",
"9:00", "9:06", "10:00", "10:20", "11:00", "11:20"), x3 = c("buy",
"t/p", "sell", "t/p", "buy", "t/p", "sell", "t/p", "buy", "t/p"
), x4 = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L), x5 = c(5.06,
5.06, 3, 3, 5.3, 5.3, 3, 3, 5.53, 5.53), x6 = c(1.88219, 1.88319,
1.88357, 1.88257, 1.88149, 1.88249, 1.88167, 1.88067, 1.88089,
1.88189), x7 = c(1.87664, 1.87664, 1.90464, 1.90464, 1.87664,
1.87664, 1.90464, 1.90464, 1.87664, 1.87664), x8 = c(1.88319,
1.88319, 1.88257, 1.88257, 1.88249, 1.88249, 1.88067, 1.88067,
1.88189, 1.88189), x9 = c(0, 342.41, 0, 203.03, 0, 358.64, 0,
203.03, 0, 374.21), x10 = c(12000, 12342.41, 12342.41, 12545.44,
12545.44, 12904.08, 12904.08, 13107.11, 13107.11, 13481.32),
Difference = c("", "1:30", "", "2:20", "", "0:06", "", "0:20",
"", "0:20")), row.names = c(NA, 10L), class = "data.frame")
Upvotes: 1
Views: 60
Reputation: 19544
If you have an ID for the pair of rows (like x4
in you example), you can use it to group_by
.
Then you can do the difference of minimum and maximum time in that group, and assign it to rows that are the maximum time.
df %>%
group_by( x4 ) %>%
mutate(difference = ifelse(time == max(time), max(time) - min(time), NA))
Upvotes: 1