Artur
Artur

Reputation: 695

"if" with two conditions errors

I trying to get the exact row where $last_200_low column changes AND 3 rows behind has the same value of the actual row. I got 2 different errors and one solution that almost satifies the problem. Any tips?

> df
               data_hora last_200_low
4070 2018-02-01 09:39:20       3200.5
4071 2018-02-01 09:39:20       3200.5
4072 2018-02-01 09:39:20       3200.5
4073 2018-02-01 09:39:20       3200.5
4074 2018-02-01 09:39:23       3200.0
4075 2018-02-01 09:39:24       3199.5
4076 2018-02-01 09:39:25       3199.0
4077 2018-02-01 09:39:25       3198.5

the outcome that I'm looking for is this:

             data_hora last_200_low
[1]2018-02-01 09:39:20       3200.5

and not this...

df[diff(df$last_200_low) != 0, ]

4073 2018-02-01 09:39:20       3200.5
4074 2018-02-01 09:39:23       3200.0
4075 2018-02-01 09:39:24       3199.5
4076 2018-02-01 09:39:25       3199.0

The other solutions that I tried:

1st one:

i <- 1
if (diff(df$last_200_low) != 0 & df$last_200_low[i] == df$last_200_low[i - 3]) {
     print(df[i])
     i <- i + 1
 }

Warning message:
In if (diff(df$last_200_low) != 0 & df$last_200_low[i] ==  :
  the condition has length > 1 and only the first element will be used

2nd one:

 if (diff(df$last_200_low[i]) != 0 & df$last_200_low[i] == df$last_200_low[i - 3]) {
     print(df[i])
     i <- i + 1
 }

Error in if (diff(df$last_200_low[i]) != 0 & df$last_200_low[i] ==  : 
  argument is of length zero

I tried multiple ways to create a vector with diff funtion but it always makes a list with one less value than the actual dataframe.

Upvotes: 0

Views: 130

Answers (4)

Russ Thomas
Russ Thomas

Reputation: 988

UPDATE 1

Misread the question, so modifying to show the last of the matching values as requested.

Sample data:

            data_hora last_200_low
1 2018-02-01 09:39:20       3200.5
2 2018-02-01 09:39:20       3200.5
3 2018-02-01 09:39:20       3200.5
4 2018-02-01 09:39:23       3200.0
5 2018-02-01 09:39:24       3199.5

Below should give the last row that has the same values at the top of your data using lag and tail:

dfindex <- (df$last_200_low == lag(df$last_200_low))
tail(df11[dfindex,],1)

Result:

            data_hora last_200_low
3 2018-02-01 09:39:20       3200.5

dput for Sample data:

df <- structure(list(data_hora = structure(c(1L, 1L, 1L, 2L, 3L), .Label = c("2018-02-01 09:39:20", 
"2018-02-01 09:39:23", "2018-02-01 09:39:24"), class = "factor"), 
    last_200_low = c(3200.5, 3200.5, 3200.5, 3200, 3199.5)), class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 0

COLO
COLO

Reputation: 1114

You can achieve what you want using data.table and the shift function:

library(data.table)
df <- data.table(df)


df<- df[, ':='(x=ifelse(shift(last_200_low,1,type='lead')!=last_200_low,1,0),
           y=ifelse(shift(last_200_low,3,type='lag')==last_200_low,1,0))][x+y>1, list(row, data_hora,last_200_low),]


> df
     data_hora last_200_low 
4070 01-02-18 9:39       3200.5 

Simple and onliner code! Hope it helps!

Upvotes: 0

Andrew
Andrew

Reputation: 93

Using data.table:

library('data.table')
func <- function(dt) unique(dt[, .(N = .N), by = data_hora][N >= 3]$data_hora)
dt[c(diff(last_200_low),0) != 0 & data_hora %in% func(dt)]

Upvotes: 0

Cole
Cole

Reputation: 11255

To get a vector equal to the length of your data.frame, you can combine with a logical:

c(diff(df$last_200_low) != 0, FALSE)

The problem with the second one is somewhat similar. df[i-3, ] will evaluate to df[-2, ] which isn't what you want. See the return:

> DF$last_200_low[-2]
[1] 3200.5 3200.5 3200.5 3200.0 3199.5 3199.0 3198.5

# versus df$last_200_low[i]
> DF$last_200_low[1]
[1] 3200.5

You can use the idea of combining and padding to get a properly sized vector for subset:

c(rep(FALSE, 3), diff(DF$last_200_low, lag = 3) == 0)
[1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

Putting it all together gives you:

DF <- data.frame(ID = 4070:4077, last_200_low = c(rep(3200.5,4), 3200.0, 3199.5, 3199.0, 3198.5))
n_lag <- 3

DF[c(diff(DF$last_200_low) !=0, FALSE) 
   & c(rep(FALSE, n_lag), diff(DF$last_200_low, lag = n_lag) == 0)
   , ]

    ID last_200_low
4 4073       3200.5

Upvotes: 3

Related Questions