Valeria Arango
Valeria Arango

Reputation: 327

Delete rows from a data table depending from the values on the previous one

I have a dataframe similar to the following one, organized by date:

|Symbol |   Date    | volume |price |
|------------------------------------
|A      |2014-01-01 | 1      |   5  |
|A      |2014-01-02 | 3      |   8  |
|A      |2014-01-03 | 7      |   4  |
|A      |2014-01-05 | 0      |  4   |
|A      |2014-01-06 |0       |   4  |
|A      |2014-01-07 |3       |   6  |
|A      |2014-01-08 |34      |   7  |
|A      |2014-01-09 |45      |  34  |
|A      |2014-01-10 |4       |   5  | 
|A      |2014-01-11 |9       |   7  |
|A      |2014-01-12 |0       |   7  |
|A      |2014-01-13 |0       |   7  | 
|A      |2014-01-14 |8       |   6  |
|A      |2014-01-15 |4       |   4  |
|A      |2014-01-16 |0       |   7  |
|A      |2014-01-17 |4       |   7  |

And I need to delete the rows that simultaneously satisfy that volume=0 and that in the price column have a value exactly equal to that of the previous row. To obtain a data frame like the following:

|Symbol |   Date    | volume |price |
|------------------------------------
|A      |2014-01-01 | 1      |   5  |
|A      |2014-01-02 | 3      |   8  |
|A      |2014-01-03 | 7      |   4  |
|A      |2014-01-07 |3       |   6  |
|A      |2014-01-08 |34      |   7  |
|A      |2014-01-09 |45      |  34  |
|A      |2014-01-10 |4       |   5  | 
|A      |2014-01-11 |9       |   7  |
|A      |2014-01-14 |8       |   6  |
|A      |2014-01-15 |4       |   4  |
|A      |2014-01-16 |0       |   7  |
|A      |2014-01-17 |4       |   7  |

I guess this should be done with a for loop, but I really don't know how to do it. I am still very new to R. I hope you can help me.

Upvotes: 0

Views: 320

Answers (2)

LMc
LMc

Reputation: 18632

In base R (this will preserve the row numbers):

df[(df$Volume != 0 | c(0, diff(df$Price) != 0)),] 

   Symbol      Date Volume Price
1       A  1/1/2014      1     5
2       A  1/2/2014      3     8
3       A  1/3/2014      7     4
6       A  1/7/2014      3     6
7       A  1/8/2014     34     7
8       A  1/9/2014     45    34
9       A 1/10/2014      4     5
10      A 1/11/2014      9     7
13      A 1/14/2014      8     6
14      A 1/15/2014      4     4
15      A 1/16/2014      0     7
16      A 1/17/2014      4     7

From the library dplyr you can use filter and lag:

library(dplyr)

dplyr::filter(df, Volume != 0 | Price != dplyr::lag(Price))

   Symbol      Date Volume Price
1       A  1/1/2014      1     5
2       A  1/2/2014      3     8
3       A  1/3/2014      7     4
4       A  1/7/2014      3     6
5       A  1/8/2014     34     7
6       A  1/9/2014     45    34
7       A 1/10/2014      4     5
8       A 1/11/2014      9     7
9       A 1/14/2014      8     6
10      A 1/15/2014      4     4
11      A 1/16/2014      0     7
12      A 1/17/2014      4     7

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101247

A base R option with subset

> subset(df,!(volume==0 & c(TRUE,diff(price)==0)))
   Symbol       Date volume price
1       A 2014-01-01      1     5
2       A 2014-01-02      3     8
3       A 2014-01-03      7     4
6       A 2014-01-07      3     6
7       A 2014-01-08     34     7
8       A 2014-01-09     45    34
9       A 2014-01-10      4     5
10      A 2014-01-11      9     7
13      A 2014-01-14      8     6
14      A 2014-01-15      4     4
15      A 2014-01-16      0     7
16      A 2014-01-17      4     7

A data.table option

> setDT(df)[!(volume==0 & c(TRUE,diff(price)==0))]
    Symbol       Date volume price
 1:      A 2014-01-01      1     5
 2:      A 2014-01-02      3     8
 3:      A 2014-01-03      7     4
 4:      A 2014-01-07      3     6
 5:      A 2014-01-08     34     7
 6:      A 2014-01-09     45    34
 7:      A 2014-01-10      4     5
 8:      A 2014-01-11      9     7
 9:      A 2014-01-14      8     6
10:      A 2014-01-15      4     4
11:      A 2014-01-16      0     7
12:      A 2014-01-17      4     7

Data

> dput(df)
structure(list(Symbol = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A"), Date = c("2014-01-01",
"2014-01-02", "2014-01-03", "2014-01-05", "2014-01-06", "2014-01-07",
"2014-01-08", "2014-01-09", "2014-01-10", "2014-01-11", "2014-01-12",
"2014-01-13", "2014-01-14", "2014-01-15", "2014-01-16", "2014-01-17"
), volume = c(1L, 3L, 7L, 0L, 0L, 3L, 34L, 45L, 4L, 9L, 0L, 0L,
8L, 4L, 0L, 4L), price = c(5L, 8L, 4L, 4L, 4L, 6L, 7L, 34L, 5L,
7L, 7L, 7L, 6L, 4L, 7L, 7L)), class = "data.frame", row.names = c(NA,
-16L))

Upvotes: 2

Related Questions