datahappy
datahappy

Reputation: 856

Creating column to identify "sessions" within data frame

I have a large set of transaction data that tracks purchases, returns, and when the point-of-sale operator clears the transaction after receiving payment/giving refund. I would like to be able to number sessions based on when the cashier "clears" the screen, and have all transactions occurring between clears numbered the same.

I pulled out all the non-essential data, but here is the dput() for what it looks like:

my.data.1<-structure(list(TOTSND_Clear = c("0", "0", "0", "0", "0", "0", 
"4.00", "0", "0", "10.00", "0", "0", "12.00", "0", "-5.00"), 
    TOTSND_UNBAL = c("0", "1.00", "0", "0", "0", "0", "0", "0", 
    "0", "0", "0", "0", "0", "0", "0")), .Names = c("TOTSND_Clear", 
"TOTSND_UNBAL"), row.names = c(NA, 15L), class = "data.frame")

It looks like this:

TOTSND_Clear    TOTSND_UNBAL
    0             0
    0             1.00
    0             0
    0             0
    0             0
    0             0
    4.00          0

All of those zeros signify some other form of transaction occurring, be it a sale or a refund. When either TOTSND_Clear or TOTSND_UNBAL have a value, it means that the transaction instance is ending. Those numbers are dollar amounts, not counts of transaction types (it just happened to work out to look like that on this example).

I would like to produce these results:

my.data.results<-structure(list(TOTSND_Clear = c("0", "0", "0", "0", "0", "0", 
"4.00", "0", "0", "10.00", "0", "0", "12.00", "0", "-5.00"), 
    TOTSND_UNBAL = c("0", "1.00", "0", "0", "0", "0", "0", "0", 
    "0", "0", "0", "0", "0", "0", "0"), session = c(1, 1, 2, 
    2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5)), .Names = c("TOTSND_Clear", 
"TOTSND_UNBAL", "session"), row.names = c(NA, 15L), class = "data.frame")

Looking like this:

TOTSND_Clear    TOTSND_UNBAL    session
    0              0              1
    0              1.00           1
    0              0              2
    0              0              2
    0              0              2
    0              0              2
    4.00           0              2

I would put code, but I have no idea where to begin. I've found ways to number the instances, but not to assign the same number to fields that occurred before the data was cleared, but after the previous clear.

Upvotes: 0

Views: 50

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

Here's one way:

c(1, cumsum(diff(as.logical(rowSums(
  my.data.1[c("TOTSND_Clear", "TOTSND_UNBAL")] != 0))) < 0) + 1)

# [1] 1 1 2 2 2 2 2 3 3 3 4 4 4 5 5

Upvotes: 2

joran
joran

Reputation: 173567

Maybe something like this...?

ind <- which(with(my.data.1,TOTSND_Clear != 0 | TOTSND_UNBAL != 0))
> rep(seq_along(ind),times = c(ind[1],diff(ind)))
 [1] 1 1 2 2 2 2 2 3 3 3 4 4 4 5 5

which you can then add as a column.

Upvotes: 2

Related Questions