DBT
DBT

Reputation: 47

Constructing Loop for Calculating Difference in R

I have the following dataset in which I need to construct a loop to calculate the differences in time from one row referencing the prior row. Once the difference is calculated I want to place that value in a new column called "Difference". The main part that I am stuck on is writing a loop to calculate the differences from when a trade is closed to when it was opened. This is denoted by buy/sell (opened) to (t/p or s/l) (closed). I have a large data frame so I would want to apply the loop across the entire data frame.

I have generated a sample data frame of what I would like the loop to do. Any suggestions would be extremely helpful.

This is my desired df: Trade Data Example

This is the current structure of my data:

structure(list(ID = 1:10, Year = c(2005L, 2005L, 2005L, 2005L, 
2005L, 2005L, 2005L, 2005L, 2005L, 2005L), Month = c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Day = c(7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L), Time = c("4:00", "5:30", "6:00", "8:20", 
"9:00", "9:06", "10:00", "10:20", "11:00", "11:20"), x3 = c("buy", 
"t/p", "sell", "t/p", "buy", "t/p", "sell", "t/p", "buy", "t/p"
), x4 = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L), x5 = c(5.06, 
5.06, 3, 3, 5.3, 5.3, 3, 3, 5.53, 5.53), x6 = c(1.88219, 1.88319, 
1.88357, 1.88257, 1.88149, 1.88249, 1.88167, 1.88067, 1.88089, 
1.88189), x7 = c(1.87664, 1.87664, 1.90464, 1.90464, 1.87664, 
1.87664, 1.90464, 1.90464, 1.87664, 1.87664), x8 = c(1.88319, 
1.88319, 1.88257, 1.88257, 1.88249, 1.88249, 1.88067, 1.88067, 
1.88189, 1.88189), x9 = c(0, 342.41, 0, 203.03, 0, 358.64, 0, 
203.03, 0, 374.21), x10 = c(12000, 12342.41, 12342.41, 12545.44, 
12545.44, 12904.08, 12904.08, 13107.11, 13107.11, 13481.32), 
    Difference = c("", "1:30", "", "2:20", "", "0:06", "", "0:20", 
    "", "0:20")), row.names = c(NA, 10L), class = "data.frame")

Upvotes: 1

Views: 60

Answers (1)

HubertL
HubertL

Reputation: 19544

If you have an ID for the pair of rows (like x4 in you example), you can use it to group_by.

Then you can do the difference of minimum and maximum time in that group, and assign it to rows that are the maximum time.

df %>% 
  group_by( x4 ) %>%
  mutate(difference = ifelse(time == max(time), max(time) - min(time), NA))

Upvotes: 1

Related Questions