Narjems
Narjems

Reputation: 111

R- create dataset by removing duplicates based on a condition - filter

I have a data frame where for each day, I have several prices. I would like to modify my data frame with the following code :

newdf <- Data %>%    
 filter(
if (Data$Date == Data$Echeance) {
  Data$Close == lag(Data$Close,1)
} else {
  Data$Close == Data$Close
}
) 

However, it is not giving me what I want, that is : create a new data frame where the variable Close takes its normal value, unless the day of Date is equal to the day of Echeance. In this case, take the following Close value. I added filter because I wanted to remove the duplicate dates, and keep only one date per day where Close satisfies the condition above.

There is no error message, it just doesn't give me the right database.

Here is a glimpse of my data:

 Date                Echeance            Compens.  Open  Haut   Bas Close 

1 1998-03-27 00:00:00 1998-09-10 00:00:00     125.   828   828   820  820.   197     
2 1998-03-27 00:00:00 1998-11-10 00:00:00     128.   847   847   842  842.   124     
3 1998-03-27 00:00:00 1999-01-11 00:00:00     131.   858   858   858  858.     2     
4 1998-03-30 00:00:00 1998-09-10 00:00:00     125.   821   821   820  820.    38     
5 1998-03-30 00:00:00 1998-11-10 00:00:00     129.   843   843   843  843.     1     
6 1998-03-30 00:00:00 1999-01-11 00:00:00     131.   860   860   860  860.     5     

Thanks a lot in advance.

Upvotes: 0

Views: 153

Answers (1)

RLave
RLave

Reputation: 8374

Sounds like a use case for ifelse, with dplyr:

library(dplyr)
Data %>%
    mutate(Close = ifelse(Date==Echeance, lead(Close,1), Close))

Here an example:

dat %>% 
  mutate(var_new = ifelse(date1==date2, lead(var,1), var))

# A tibble: 3 x 4
# date1      date2        var var_new
# <date>     <date>     <int>   <int>
# 1 2018-03-27 2018-03-27    10      11
# 2 2018-03-28 2018-01-01    11      11
# 3 2018-03-29 2018-02-01    12      12

The function lead will move the vector by 1 position. Also note that I created a var_new just to show the difference, but you can mutate directly var.

Data used:

dat <- tibble(date1 = seq(from=as.Date("2018-03-27"), to=as.Date("2018-03-29"), by="day"),
              date2 = c(as.Date("2018-03-27"), as.Date("2018-01-01"), as.Date("2018-02-01")),
              var = 10:12)
dat
# A tibble: 3 x 3
# date1      date2        var
# <date>     <date>     <int>
# 1 2018-03-27 2018-03-27    10
# 2 2018-03-28 2018-01-01    11
# 3 2018-03-29 2018-02-01    12

Upvotes: 1

Related Questions