Reputation: 2493
I need to delete xts rows based on certain criterias in column [code]. It is fine that by deleting there will be time-gaps in the xts time series.
Question: How do I solve step1/step3/step4.
The criterias are as following:
Step-1: Value [3] in [code]: If xts starts with [code] [3] delete that row.
Step-2: Value [0] in [code]: Delete the complete row.
Step-3: Value [2] in [code]: a) Keep only [2] that starts the xts, all lines above the first [2] should be removed. b) Keep [2] that has a [3] above itself.
Step-4: Value [3] in [code]: Keep only [3] that has a [2] above itself.
My solution for step-2:
Finds and keeps, all [2] and [3], thus removing all [0]:
xts3 <- xts3[grep("[2]|[3]", xts3$code), ]
My R-file:
dates <- as.POSIXct( # Construct the dates to be used.
c(
"2013-07-24 09:01:00",
"2013-07-24 09:02:00",
"2013-07-24 09:03:00",
"2013-07-24 09:04:00",
"2013-07-24 09:05:00",
"2013-07-24 09:06:00",
"2013-07-24 09:07:00",
"2013-07-24 09:08:00",
"2013-07-24 09:09:00"
)
)
code <- c(3, 2, 0, 2, 2, 2, 3, 3, 3) # Criterias for delete/keep rows.
data <- data.frame(code) # Create a dataframe.
xts3 <- xts(x=data, order.by=dates) # Create xts based on dataframe.
The result of the R-file (prior to deleting rows based on criterias):
code
2013-07-24 09:01:00 3
2013-07-24 09:02:00 2
2013-07-24 09:03:00 0
2013-07-24 09:04:00 2
2013-07-24 09:05:00 2
2013-07-24 09:06:00 2
2013-07-24 09:07:00 3
2013-07-24 09:08:00 3
2013-07-24 09:09:00 3
Explanation: What should trigger delete of rows (based on criterias):
code
2013-07-24 09:01:00 3 # To be removed due to step-1.
2013-07-24 09:02:00 2 # To be kept due to step-3a.
2013-07-24 09:03:00 0 # To be removed due to step-2
2013-07-24 09:04:00 2 # To be removed due to not fulfilling step-3b
2013-07-24 09:05:00 2 # To be removed due to not fulfilling step-3b
2013-07-24 09:06:00 2 # To be removed due to not fulfilling step-3b
2013-07-24 09:07:00 3 # The kept due to step-4
2013-07-24 09:08:00 3 # To be removed due to not fulfilling step4.
2013-07-24 09:09:00 3 # To be removed due to not fulfilling step4.
Expected outcome after deleting rows has been done:
code
2013-07-24 09:02:00 2
2013-07-24 09:07:00 3
Upvotes: 0
Views: 472
Reputation: 23608
If you only have 0, 2, and 3 as values you can use diff
to get most of the rules in 1 go. Only those records are needed where the difference is 1 (2 above 3) or -1 (3 above 2). So the absolute value of diff
will be what we need. And we need the first row where the value is 2. Those we combine to get the result xts3_filtered.
xts3_filtered <- c(xts3[first(which(xts3$code == 2))], xts3[abs(diff(xts3$code)) == 1])
code
2013-07-24 09:02:00 2
2013-07-24 09:02:00 2
2013-07-24 09:07:00 3
Now we have a duplicate row because both rules select the record where the first 2 occurs. So we remove any duplicates with the following code
xts3_filtered[!duplicated(index(xts3_filtered))]
code
2013-07-24 09:02:00 2
2013-07-24 09:07:00 3
Upvotes: 1