Creating column to identify "sessions" within data frame

Question

I have a large set of transaction data that tracks purchases, returns, and when the point-of-sale operator clears the transaction after receiving payment/giving refund. I would like to be able to number sessions based on when the cashier "clears" the screen, and have all transactions occurring between clears numbered the same.

I pulled out all the non-essential data, but here is the dput() for what it looks like:

my.data.1<-structure(list(TOTSND_Clear = c("0", "0", "0", "0", "0", "0", 
"4.00", "0", "0", "10.00", "0", "0", "12.00", "0", "-5.00"), 
    TOTSND_UNBAL = c("0", "1.00", "0", "0", "0", "0", "0", "0", 
    "0", "0", "0", "0", "0", "0", "0")), .Names = c("TOTSND_Clear", 
"TOTSND_UNBAL"), row.names = c(NA, 15L), class = "data.frame")

It looks like this:

TOTSND_Clear    TOTSND_UNBAL
    0             0
    0             1.00
    0             0
    0             0
    0             0
    0             0
    4.00          0

All of those zeros signify some other form of transaction occurring, be it a sale or a refund. When either TOTSND_Clear or TOTSND_UNBAL have a value, it means that the transaction instance is ending. Those numbers are dollar amounts, not counts of transaction types (it just happened to work out to look like that on this example).

I would like to produce these results:

my.data.results<-structure(list(TOTSND_Clear = c("0", "0", "0", "0", "0", "0", 
"4.00", "0", "0", "10.00", "0", "0", "12.00", "0", "-5.00"), 
    TOTSND_UNBAL = c("0", "1.00", "0", "0", "0", "0", "0", "0", 
    "0", "0", "0", "0", "0", "0", "0"), session = c(1, 1, 2, 
    2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5)), .Names = c("TOTSND_Clear", 
"TOTSND_UNBAL", "session"), row.names = c(NA, 15L), class = "data.frame")

Looking like this:

TOTSND_Clear    TOTSND_UNBAL    session
    0              0              1
    0              1.00           1
    0              0              2
    0              0              2
    0              0              2
    0              0              2
    4.00           0              2

I would put code, but I have no idea where to begin. I've found ways to number the instances, but not to assign the same number to fields that occurred before the data was cleared, but after the previous clear.

Sven Hohenstein · Accepted Answer

Here's one way:

c(1, cumsum(diff(as.logical(rowSums(
  my.data.1[c("TOTSND_Clear", "TOTSND_UNBAL")] != 0))) < 0) + 1)

# [1] 1 1 2 2 2 2 2 3 3 3 4 4 4 5 5

Creating column to identify "sessions" within data frame

Answers (2)

Related Questions

Creating column to identify &quot;sessions&quot; within data frame

Answers (2)

Related Questions

Creating column to identify "sessions" within data frame