Reputation: 695
I trying to get the exact row where $last_200_low
column changes AND 3 rows behind has the same value of the actual row. I got 2 different errors and one solution that almost satifies the problem. Any tips?
> df
data_hora last_200_low
4070 2018-02-01 09:39:20 3200.5
4071 2018-02-01 09:39:20 3200.5
4072 2018-02-01 09:39:20 3200.5
4073 2018-02-01 09:39:20 3200.5
4074 2018-02-01 09:39:23 3200.0
4075 2018-02-01 09:39:24 3199.5
4076 2018-02-01 09:39:25 3199.0
4077 2018-02-01 09:39:25 3198.5
the outcome that I'm looking for is this:
data_hora last_200_low
[1]2018-02-01 09:39:20 3200.5
and not this...
df[diff(df$last_200_low) != 0, ]
4073 2018-02-01 09:39:20 3200.5
4074 2018-02-01 09:39:23 3200.0
4075 2018-02-01 09:39:24 3199.5
4076 2018-02-01 09:39:25 3199.0
The other solutions that I tried:
1st one:
i <- 1
if (diff(df$last_200_low) != 0 & df$last_200_low[i] == df$last_200_low[i - 3]) {
print(df[i])
i <- i + 1
}
Warning message:
In if (diff(df$last_200_low) != 0 & df$last_200_low[i] == :
the condition has length > 1 and only the first element will be used
2nd one:
if (diff(df$last_200_low[i]) != 0 & df$last_200_low[i] == df$last_200_low[i - 3]) {
print(df[i])
i <- i + 1
}
Error in if (diff(df$last_200_low[i]) != 0 & df$last_200_low[i] == :
argument is of length zero
I tried multiple ways to create a vector with diff funtion but it always makes a list with one less value than the actual dataframe.
Upvotes: 0
Views: 130
Reputation: 988
Misread the question, so modifying to show the last of the matching values as requested.
Sample data:
data_hora last_200_low
1 2018-02-01 09:39:20 3200.5
2 2018-02-01 09:39:20 3200.5
3 2018-02-01 09:39:20 3200.5
4 2018-02-01 09:39:23 3200.0
5 2018-02-01 09:39:24 3199.5
Below should give the last row that has the same values at the top of your data using lag
and tail
:
dfindex <- (df$last_200_low == lag(df$last_200_low))
tail(df11[dfindex,],1)
Result:
data_hora last_200_low
3 2018-02-01 09:39:20 3200.5
dput for Sample data:
df <- structure(list(data_hora = structure(c(1L, 1L, 1L, 2L, 3L), .Label = c("2018-02-01 09:39:20",
"2018-02-01 09:39:23", "2018-02-01 09:39:24"), class = "factor"),
last_200_low = c(3200.5, 3200.5, 3200.5, 3200, 3199.5)), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 0
Reputation: 1114
You can achieve what you want using data.table and the shift function:
library(data.table)
df <- data.table(df)
df<- df[, ':='(x=ifelse(shift(last_200_low,1,type='lead')!=last_200_low,1,0),
y=ifelse(shift(last_200_low,3,type='lag')==last_200_low,1,0))][x+y>1, list(row, data_hora,last_200_low),]
> df
data_hora last_200_low
4070 01-02-18 9:39 3200.5
Simple and onliner code! Hope it helps!
Upvotes: 0
Reputation: 93
Using data.table:
library('data.table')
func <- function(dt) unique(dt[, .(N = .N), by = data_hora][N >= 3]$data_hora)
dt[c(diff(last_200_low),0) != 0 & data_hora %in% func(dt)]
Upvotes: 0
Reputation: 11255
To get a vector equal to the length of your data.frame
, you can combine with a logical:
c(diff(df$last_200_low) != 0, FALSE)
The problem with the second one is somewhat similar. df[i-3, ]
will evaluate to df[-2, ]
which isn't what you want. See the return:
> DF$last_200_low[-2]
[1] 3200.5 3200.5 3200.5 3200.0 3199.5 3199.0 3198.5
# versus df$last_200_low[i]
> DF$last_200_low[1]
[1] 3200.5
You can use the idea of combining and padding to get a properly sized vector for subset:
c(rep(FALSE, 3), diff(DF$last_200_low, lag = 3) == 0)
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
Putting it all together gives you:
DF <- data.frame(ID = 4070:4077, last_200_low = c(rep(3200.5,4), 3200.0, 3199.5, 3199.0, 3198.5))
n_lag <- 3
DF[c(diff(DF$last_200_low) !=0, FALSE)
& c(rep(FALSE, n_lag), diff(DF$last_200_low, lag = n_lag) == 0)
, ]
ID last_200_low
4 4073 3200.5
Upvotes: 3