Reputation: 327
I have a dataframe similar to the following one, organized by date:
|Symbol | Date | volume |price |
|------------------------------------
|A |2014-01-01 | 1 | 5 |
|A |2014-01-02 | 3 | 8 |
|A |2014-01-03 | 7 | 4 |
|A |2014-01-05 | 0 | 4 |
|A |2014-01-06 |0 | 4 |
|A |2014-01-07 |3 | 6 |
|A |2014-01-08 |34 | 7 |
|A |2014-01-09 |45 | 34 |
|A |2014-01-10 |4 | 5 |
|A |2014-01-11 |9 | 7 |
|A |2014-01-12 |0 | 7 |
|A |2014-01-13 |0 | 7 |
|A |2014-01-14 |8 | 6 |
|A |2014-01-15 |4 | 4 |
|A |2014-01-16 |0 | 7 |
|A |2014-01-17 |4 | 7 |
And I need to delete the rows that simultaneously satisfy that volume=0 and that in the price column have a value exactly equal to that of the previous row. To obtain a data frame like the following:
|Symbol | Date | volume |price |
|------------------------------------
|A |2014-01-01 | 1 | 5 |
|A |2014-01-02 | 3 | 8 |
|A |2014-01-03 | 7 | 4 |
|A |2014-01-07 |3 | 6 |
|A |2014-01-08 |34 | 7 |
|A |2014-01-09 |45 | 34 |
|A |2014-01-10 |4 | 5 |
|A |2014-01-11 |9 | 7 |
|A |2014-01-14 |8 | 6 |
|A |2014-01-15 |4 | 4 |
|A |2014-01-16 |0 | 7 |
|A |2014-01-17 |4 | 7 |
I guess this should be done with a for loop, but I really don't know how to do it. I am still very new to R. I hope you can help me.
Upvotes: 0
Views: 320
Reputation: 18632
In base R (this will preserve the row numbers):
df[(df$Volume != 0 | c(0, diff(df$Price) != 0)),]
Symbol Date Volume Price
1 A 1/1/2014 1 5
2 A 1/2/2014 3 8
3 A 1/3/2014 7 4
6 A 1/7/2014 3 6
7 A 1/8/2014 34 7
8 A 1/9/2014 45 34
9 A 1/10/2014 4 5
10 A 1/11/2014 9 7
13 A 1/14/2014 8 6
14 A 1/15/2014 4 4
15 A 1/16/2014 0 7
16 A 1/17/2014 4 7
From the library dplyr
you can use filter
and lag
:
library(dplyr)
dplyr::filter(df, Volume != 0 | Price != dplyr::lag(Price))
Symbol Date Volume Price
1 A 1/1/2014 1 5
2 A 1/2/2014 3 8
3 A 1/3/2014 7 4
4 A 1/7/2014 3 6
5 A 1/8/2014 34 7
6 A 1/9/2014 45 34
7 A 1/10/2014 4 5
8 A 1/11/2014 9 7
9 A 1/14/2014 8 6
10 A 1/15/2014 4 4
11 A 1/16/2014 0 7
12 A 1/17/2014 4 7
Upvotes: 2
Reputation: 101247
A base R option with subset
> subset(df,!(volume==0 & c(TRUE,diff(price)==0)))
Symbol Date volume price
1 A 2014-01-01 1 5
2 A 2014-01-02 3 8
3 A 2014-01-03 7 4
6 A 2014-01-07 3 6
7 A 2014-01-08 34 7
8 A 2014-01-09 45 34
9 A 2014-01-10 4 5
10 A 2014-01-11 9 7
13 A 2014-01-14 8 6
14 A 2014-01-15 4 4
15 A 2014-01-16 0 7
16 A 2014-01-17 4 7
A data.table
option
> setDT(df)[!(volume==0 & c(TRUE,diff(price)==0))]
Symbol Date volume price
1: A 2014-01-01 1 5
2: A 2014-01-02 3 8
3: A 2014-01-03 7 4
4: A 2014-01-07 3 6
5: A 2014-01-08 34 7
6: A 2014-01-09 45 34
7: A 2014-01-10 4 5
8: A 2014-01-11 9 7
9: A 2014-01-14 8 6
10: A 2014-01-15 4 4
11: A 2014-01-16 0 7
12: A 2014-01-17 4 7
Data
> dput(df)
structure(list(Symbol = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A"), Date = c("2014-01-01",
"2014-01-02", "2014-01-03", "2014-01-05", "2014-01-06", "2014-01-07",
"2014-01-08", "2014-01-09", "2014-01-10", "2014-01-11", "2014-01-12",
"2014-01-13", "2014-01-14", "2014-01-15", "2014-01-16", "2014-01-17"
), volume = c(1L, 3L, 7L, 0L, 0L, 3L, 34L, 45L, 4L, 9L, 0L, 0L,
8L, 4L, 0L, 4L), price = c(5L, 8L, 4L, 4L, 4L, 6L, 7L, 34L, 5L,
7L, 7L, 7L, 6L, 4L, 7L, 7L)), class = "data.frame", row.names = c(NA,
-16L))
Upvotes: 2