Reputation: 437
I have a time series with values at Time = 23:00:00 that are always wrong, so I need to change those values.
Sample Data:
data <- data.table(
Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
TotalVolume = c(5,20,15,11,19,4)
)
Which looks like:
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 22.900 19
6: 01:02:00 21.286 4
I would like to replace the Open value when Time = 23:00:00 with the Open value immediately preceding it. Which should look like this:
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 21.300 19
6: 01:02:00 21.286 4
I have tried using the lag function without the desired results:
data$Open[data$Time == "23:00:00"] <- lag(data,1)
and
data$Open[data$Time == "23:00:00"] <- lag(data$Open[data$Time == "23:00:00"],1)
Upvotes: 2
Views: 65
Reputation: 2960
> n <- which(data$Time=="23:00:00")
> data$Open[n] <- data$Open[n-1]
> data
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 21.300 19
6: 01:02:00 21.286 4
>
n
contains the position(s) where the Time is "23:00:00", so n-1 are the positions immediately preceeding the "23:00:00"-positions. Hence the assignment data$Open[n] <- data$Open[n-1]
does what we want.
Upvotes: 3
Reputation: 10473
Try using this:
library(dplyr)
> data <- data.table(
+ Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
+ Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
+ TotalVolume = c(5,20,15,11,19,4)
+ )
> data <- data %>% mutate(Open = ifelse(Time == '23:00:00', lag(Open), Open))
> data
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 21.300 19
6: 01:02:00 21.286 4
You can also use this way without the dplyr mutate function like this:
> data <- data.table(
+ Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
+ Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
+ TotalVolume = c(5,20,15,11,19,4)
+ )
> data$Open <- ifelse(data$Time == '23:00:00', lag(data$Open), data$Open)
> data
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 21.300 19
6: 01:02:00 21.286 4
Upvotes: 2