Reputation:
I have a dataframe:
ID operation
A1 open
A1 open
A1 close
A1
A1 open
A1 close
B2
B2 open
B2 open
B2 open
B2 close
B2 upload
B2 open
B2 close
B2 open
B2 close
I want to add index for each bundle of "open" and "close" in column operation. So for each row between open and close must have same index. So desired result is:
ID operation index
A1 open 1
A1 open 1
A1 close 1
A1
A1 open 2
A1 close 2
B2
B2 open 3
B2 open 3
B2 open 3
B2 close 3
B2 upload
B2 open 4
B2 close 4
B2 open 5
B2 close 5
How could i do that? Which function could be used for these and how to apply it? Preferably by using data.table
Upvotes: 2
Views: 105
Reputation: 26238
Taking leads from the mt1022's fantastic solution, I am translating the answer for dplyr
users
df %>% mutate(index = dense_rank(-rev(cumsum(rev(operation) == 'close')))) %>%
group_by(ID, index) %>% mutate(index = ifelse(cumsum(operation == 'open') > 0, index, NA)) %>%
ungroup()
# A tibble: 16 x 3
ID operation index
<chr> <chr> <int>
1 A1 open 1
2 A1 open 1
3 A1 close 1
4 A1 <NA> NA
5 A1 open 2
6 A1 close 2
7 B2 <NA> NA
8 B2 open 3
9 B2 open 3
10 B2 open 3
11 B2 close 3
12 B2 upload NA
13 B2 open 4
14 B2 close 4
15 B2 open 5
16 B2 close 5
Upvotes: 1
Reputation: 102760
Here is a data.table
option which perhaps helps
setDT(df)[
,
index := ifelse(
operation %in% c("open", "close"),
cumsum(shift(operation == "close", fill = TRUE)),
NA
)
]
giving
> df
ID operation index
1: A1 open 1
2: A1 open 1
3: A1 close 1
4: A1 NA
5: A1 open 2
6: A1 close 2
7: B2 NA
8: B2 open 3
9: B2 open 3
10: B2 open 3
11: B2 close 3
12: B2 upload NA
13: B2 open 4
14: B2 close 4
15: B2 open 5
16: B2 close 5
Data
> dput(df)
structure(list(ID = c("A1", "A1", "A1", "A1", "A1", "A1", "B2",
"B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2"), operation = c("open",
"open", "close", "", "open", "close", "", "open", "open", "open",
"close", "upload", "open", "close", "open", "close")), class = "data.frame", row.names = c(NA,
-16L))
Upvotes: 0
Reputation: 17299
Here is a way to do it in data.table
:
dt[, index := .GRP, by = .(rev(cumsum(rev(operation) == 'close')))]
dt[, index := ifelse(cumsum(operation == 'open') > 0, index, NA), by = .(ID, index)]
dt
# ID operation index
# 1: A1 open 1
# 2: A1 open 1
# 3: A1 close 1
# 4: A1 NA
# 5: A1 open 2
# 6: A1 close 2
# 7: B2 NA
# 8: B2 open 3
# 9: B2 open 3
# 10: B2 open 3
# 11: B2 close 3
# 12: B2 upload NA
# 13: B2 open 4
# 14: B2 close 4
# 15: B2 open 5
# 16: B2 close 5
Due to the two path of cumsum
, if an operation is not in c('open', 'close')
, it will:
open
and close
;NA
if it is between close
and open
.Upvotes: 2
Reputation: 5491
Here is a naive solution:
rank <- df$operation == "close" & !is.na(df$operation)
df$index <- cumsum(c(1, rank[-length(rank)]))
df$index[!df$operation %in% c("open", "close")] <- NA
df
ID operation index
1 A1 open 1
2 A1 open 1
3 A1 close 1
4 A1 <NA> NA
5 A1 open 2
6 A1 close 2
7 B2 <NA> NA
8 B2 open 3
9 B2 open 3
10 B2 open 3
11 B2 close 3
12 B2 open 4
13 B2 close 4
14 B2 open 5
15 B2 close 5
Upvotes: 0
Reputation: 389315
Here's a solution with for
loop :
df$index <- NA
index <- 0
flag <- FALSE
for(i in seq_len(nrow(df))) {
if(df$operation[i] == '') next
else if(df$operation[i] == 'open' & !flag) {
index <- index + 1
flag <- TRUE
}
else if(df$operation[i] == 'close' & flag) flag <- FALSE
df$index[i] <- index
}
df
# ID operation index
#1 A1 open 1
#2 A1 open 1
#3 A1 close 1
#4 A1 NA
#5 A1 open 2
#6 A1 close 2
#7 B2 NA
#8 B2 open 3
#9 B2 open 3
#10 B2 open 3
#11 B2 close 3
#12 B2 open 4
#13 B2 close 4
#14 B2 open 5
#15 B2 close 5
data
df <- structure(list(ID = c("A1", "A1", "A1", "A1", "A1", "A1", "B2",
"B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2"), operation = c("open",
"open", "close", "", "open", "close", "", "open", "open", "open",
"close", "open", "close", "open", "close")),
class = "data.frame", row.names = c(NA, -15L))
Upvotes: 0