Reputation: 2491
Question: How can I clean data from "duplicates" within blocks. I use the term [blocks] to illustrate that 2 values (in same column) are equal and are positioned either above of below.
In column [c1] I have the values [2] and [3].
Value [2] should never have value [2] under.
Value [3] should never have value [3] under.
I cannot use a standard duplication removal function, because there will be duplicates in the column. It is not possible to delete the rows manual since they will be in amount of thousands.
If possible it would be good to solve without loading any R packages.
My R-file:
##########
# Test xts
##########
dates <- as.POSIXct(c
(
"2013-07-24 09:01:00",
"2013-07-24 09:02:00",
"2013-07-24 09:03:00",
"2013-07-24 09:04:00",
"2013-07-24 09:05:00",
"2013-07-24 09:06:00",
"2013-07-24 09:07:00"
)
)
c1 <- c(2,3,2,2,3,3,2) # Data in c1.
# c2 <- c(0,3,2,2,3,0,2) # Data in c2.
data <- data.frame(c1) # Create a dataframe.
xts9 <- xts(x=data, order.by=dates) # Create xts based on dataframe.
The result of running the R-file:
c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3
2013-07-24 09:07:00 2
Comments of which lines should be deleted:
c1
2013-07-24 09:01:00 2
2013-07-24 09:02:00 3
2013-07-24 09:03:00 2
2013-07-24 09:04:00 2 # To be remove due to having a 2 above.
2013-07-24 09:05:00 3
2013-07-24 09:06:00 3 # To be remove due to having a 2 above.
2013-07-24 09:07:00 2
Upvotes: 1
Views: 43
Reputation: 389175
We can use rleid
function from data.table
and then use duplicated
to remove the repeating rows.
library(data.table)
xts9[!duplicated(rleid(xts9)), ]
# c1
#2013-07-24 09:01:00 2
#2013-07-24 09:02:00 3
#2013-07-24 09:03:00 2
#2013-07-24 09:05:00 3
#2013-07-24 09:07:00 2
If you want to do this in base R, we can use rle
instead using the same logic
x <- rle(rowSums(xts9))
xts9[!duplicated(rep(seq_along(x$values), x$lengths)), ]
# c1
#2013-07-24 09:01:00 2
#2013-07-24 09:02:00 3
#2013-07-24 09:03:00 2
#2013-07-24 09:05:00 3
#2013-07-24 09:07:00 2
Upvotes: 1