Reputation: 3021
I have a dataframe:
Time CardID Data Type
1 2018-01-01 10:44:35 10037479 PowerOn STBEvent
2 2018-01-01 10:44:48 10037479 0401 UseRemote
3 2018-01-01 10:44:53 10037479 0301 UseRemote
4 2018-01-01 10:45:13 10037479 0401 UseRemote
5 2018-01-01 10:45:24 10037479 0301 UseRemote
6 2018-01-01 10:45:30 10037479 1415 LiveView
7 2018-01-01 10:45:37 10037479 0401 UseRemote
8 2018-01-01 11:08:01 10037479 1412 LiveView
9 2018-01-01 11:08:13 10037479 0401 UseRemote
10 2018-01-01 11:14:31 10037479 0301 UseRemote
structure(list(Time = structure(c(1514783675, 1514783688, 1514783693,
1514783713, 1514783724, 1514783730, 1514783737, 1514785081, 1514785093,
1514785471), class = c("POSIXct", "POSIXt")), CardID = c("10037479",
"10037479", "10037479", "10037479", "10037479", "10037479", "10037479",
"10037479", "10037479", "10037479"), Data = c("PowerOn", "0401",
"0301", "0401", "0301", "1415", "0401", "1412", "0401", "0301"
), Type = c("STBEvent", "UseRemote", "UseRemote", "UseRemote",
"UseRemote", "LiveView", "UseRemote", "LiveView", "UseRemote",
"UseRemote")), .Names = c("Time", "CardID", "Data", "Type"), row.names = c(NA,
10L), class = "data.frame")
I am using lead and lag functions from dplyr to fetch data points before and after a particular row. For example I am using this:
ae1 <- which(dplyr::lag(df$Data)=="1415")+1
This gets me row number 6 from the above data frame where the Type is equal to "LiveView". I understand by altering the integer at the end of the code, I could get the respective line from the data frame. My question is: can I use the same / similar function wherein I can fetch data the next "LiveView" - which is row number 8. I can very well do
ae1 <- which(dplyr::lag(df$Data)=="1415")+3
to get row number 8. But the next LiveView type can occur in any row other than 8. I am thinking something similar to
ae1 <- which(dplyr::lag(df$Data)=="1415")+nrow(where Type == next "LiveView")
Upvotes: 1
Views: 316
Reputation: 66819
I am using lead and lag functions from dplyr to fetch data points before and after a particular row. [...] can I use the same / similar function wherein I can fetch data the next "LiveView" [?]
If for each instance of Data == 1415, you want to find the next row where CardID matches, Type matches, and Time is greater, then there's...
library(data.table)
setDT(df)
mdf = df[Data == "1415", .(CardID, Type, Time)]
w = df[mdf, on=.(CardID, Type, Time > Time), mult="first", which=TRUE]
df[w]
# Time CardID Data Type
# 1: 2018-01-01 00:38:01 10037479 1412 LiveView
If you have duplicate times, then Time
won't work as a row number. You can add a row number like df[, rn := .I]
or df[, rn := rowid(CardID)]
and use that instead.
The line with on=
is a non-equi join, currently not available in dplyr, which is why I'm posting with a different package here.
If you want to return both rows....
w0 = df[Data == "1415", which=TRUE]
w = df[df[w0], on=.(CardID, Type, Time > Time), mult="first", which=TRUE]
df[matrix(c(w0, w), 2, byrow=TRUE)]
# Time CardID Data Type
# 1: 2018-01-01 00:15:30 10037479 1415 LiveView
# 2: 2018-01-01 00:38:01 10037479 1412 LiveView
Or if you also want data points before the row:
wb = df[df[w0], on=.(CardID, Type, Time < Time), mult="first", which=TRUE]
df[matrix(c(wb, w0, w), 3, byrow=TRUE)]
# Time CardID Data Type
# 1: <NA> <NA> <NA> <NA>
# 2: 2018-01-01 00:15:30 10037479 1415 LiveView
# 3: 2018-01-01 00:38:01 10037479 1412 LiveView
NAs are shown since no row meets those criteria.
Upvotes: 1
Reputation: 270248
1) If the objective is to find the row number of the first LiveWire
row past the first 1415 row then use the conjunction shown followed by which
and first
to get the row numbers and the first of those row numbers. Note that cummany
is true for the first Data
component of 1415 onwards and by lagging it we get true only for those after it to the end. If we knew that the there were only one such row then we could omit the first
. Since dplyr's lag conflicts with lag
in the base we use dplyr::lag
to be sure we are using the desired one.
df %>%
{ dplyr::lag(cumany(.$Data == 1415)) & .$Type == "LiveView" } %>%
which %>%
first
## [1] 8
2) If we wanted the row itself instead then use filter
and slice
. If we knew there were only one such row we could omit the slice
:
df %>%
filter(dplyr::lag(cumany(Data == 1415)) & Type == "LiveView") %>%
slice(1)
## Time CardID Data Type
## 1 2018-01-01 00:38:01 10037479 1412 LiveView
Note that if we added a row number to df
by replacing the first line of code with:
df %>% mutate(n = 1:n()) %>%
then the above code would also give the row number in the n
column in addition to the row itself in the other columns.
2a) An alternative to (2) is that we could first filter by cumany(Data == 1415)
to give all rows from the first 1415 row onwards and then remove the first row since we only want rows after it and then find the LiveView
rows within that and take the first.
df %>%
filter(cumany(Data == 1415)) %>%
slice(-1) %>%
filter(Type == "LiveView") %>%
slice(1)
## Time CardID Data Type
## 1 2018-01-01 00:38:01 10037479 1412 LiveView
Revised.
Upvotes: 1
Reputation: 2793
One way to do this would be to dplyr::group_by
the data based on Type
, dplyr::filter
the Type
of interest, and then dplyr::slice
to the position you want, in this case, position 2
:
library(dplyr)
df <-
structure(
list(
Time =
structure(c(1514783675, 1514783688, 1514783693,
1514783713, 1514783724, 1514783730, 1514783737, 1514785081, 1514785093,
1514785471), class = c("POSIXct", "POSIXt")),
CardID = c("10037479", "10037479", "10037479", "10037479", "10037479", "10037479", "10037479",
"10037479", "10037479", "10037479"),
Data = c("PowerOn", "0401", "0301", "0401", "0301", "1415", "0401", "1412", "0401", "0301"),
Type = c("STBEvent", "UseRemote", "UseRemote", "UseRemote",
"UseRemote", "LiveView", "UseRemote", "LiveView", "UseRemote",
"UseRemote")),
.Names = c("Time", "CardID", "Data", "Type"),
row.names = c(NA, 10L),
class = "data.frame")
df %>%
group_by(Type) %>%
filter(Type %in% 'LiveView') %>%
slice(2)
Upvotes: 0