JHall651
JHall651

Reputation: 437

Change a Specific Value to a Lagged Value in r

I have a time series with values at Time = 23:00:00 that are always wrong, so I need to change those values.

Sample Data:

data <- data.table(
    Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
    Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
    TotalVolume = c(5,20,15,11,19,4) 
    )

Which looks like:

   Time        Open     TotalVolume
1: 20:47:00    21.306    5
2: 20:52:00    21.305    20
3: 21:25:00    21.305    15
4: 22:25:00    21.300    11
5: 23:00:00    22.900    19
6: 01:02:00    21.286    4

I would like to replace the Open value when Time = 23:00:00 with the Open value immediately preceding it. Which should look like this:

   Time        Open     TotalVolume
1: 20:47:00    21.306    5
2: 20:52:00    21.305    20
3: 21:25:00    21.305    15
4: 22:25:00    21.300    11
5: 23:00:00    21.300    19
6: 01:02:00    21.286    4

I have tried using the lag function without the desired results:

data$Open[data$Time == "23:00:00"] <- lag(data,1)

and

data$Open[data$Time == "23:00:00"] <- lag(data$Open[data$Time == "23:00:00"],1)

Upvotes: 2

Views: 65

Answers (2)

mra68
mra68

Reputation: 2960

> n <- which(data$Time=="23:00:00")
> data$Open[n] <- data$Open[n-1]
> data
       Time   Open TotalVolume
1: 20:47:00 21.306           5
2: 20:52:00 21.305          20
3: 21:25:00 21.305          15
4: 22:25:00 21.300          11
5: 23:00:00 21.300          19
6: 01:02:00 21.286           4
> 

n contains the position(s) where the Time is "23:00:00", so n-1 are the positions immediately preceeding the "23:00:00"-positions. Hence the assignment data$Open[n] <- data$Open[n-1] does what we want.

Upvotes: 3

Gopala
Gopala

Reputation: 10473

Try using this:

library(dplyr)
> data <- data.table(
+     Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
+     Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
+     TotalVolume = c(5,20,15,11,19,4) 
+ )
> data <- data %>% mutate(Open = ifelse(Time == '23:00:00', lag(Open), Open))
> data
       Time   Open TotalVolume
1: 20:47:00 21.306           5
2: 20:52:00 21.305          20
3: 21:25:00 21.305          15
4: 22:25:00 21.300          11
5: 23:00:00 21.300          19
6: 01:02:00 21.286           4

You can also use this way without the dplyr mutate function like this:

> data <- data.table(
+     Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
+     Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
+     TotalVolume = c(5,20,15,11,19,4) 
+ )
> data$Open <- ifelse(data$Time == '23:00:00', lag(data$Open), data$Open)
> data
       Time   Open TotalVolume
1: 20:47:00 21.306           5
2: 20:52:00 21.305          20
3: 21:25:00 21.305          15
4: 22:25:00 21.300          11
5: 23:00:00 21.300          19
6: 01:02:00 21.286           4

Upvotes: 2

Related Questions