Reputation: 25

R data.frame: Efficient way to create counter for next change of value in column

vector A:

a = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1)

vector B: (only used for initialization)

b = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

Dataframe:

dft <- data.frame(a,b)

The following for-loop compares for each row "i" the value A[i] with A[i+1] in vector A. If i+1 is different -> write "count" else check i+2 and increment "count" ...

The idea is to know for each row, the number of rows until the value in A changes.

count = 0

% takes endless (for large set) but does its job

for(i in 1:nrow(dft)) {
    for(j in i+1:nrow(dft)-1) {
        j_value <- dft[j,"a"]
        i_value <- dft[i,"a"]
        if (!is.na(j_value) & !is.na(i_value)){
            tmp_value <- abs(i_value - j_value)
            if(tmp_value > 0) {
               dft[i,"b"] <- count
               count = 0
               break
            } else {
                count = count + 1
            }
        }
    }
}

Results should be:

Upvotes: 1

Answers (3)

A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

The following should work:

b = rle(a)
unlist(mapply(":", b$lengths, 1))
# [1] 5 4 3 2 1 1 2 1 3 2 1 5 4 3 2 1 1

Or in one line:

with(rle(a), unlist(Map(":", lengths, 1)))

Using "data.table", you can do the following:

library(data.table)
data.table(a)[, b := .N:1, rleid(a)][]
#     a b
#  1: 0 5
#  2: 0 4
#  3: 0 3
#  4: 0 2
#  5: 0 1
#  6: 1 1
#  7: 0 2
#  8: 0 1
#  9: 1 3
# 10: 1 2
# 11: 1 1
# 12: 0 5
# 13: 0 4
# 14: 0 3
# 15: 0 2
# 16: 0 1
# 17: 1 1

Upvotes: 2

gfgm

Reputation: 3647

Here is another approach with apply:

# The data
a=c(0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1)

# An index of the data
ind <- 1:length(a)

# The function to apply 

f <- function(x){ifelse(!is.na(which(a[x]!=a[x:length(a)])[1] - 1), # Check if we are in the last group before series ends
                        which(a[x]!=a[x:length(a)])[1] - 1, # if not return distance to nearest value change
                        ind[length(a)] - x + 1) # if we are return length of last block of values
  }

unlist(lapply(ind, f)) # Apply and unlist to vector
#>  [1] 5 4 3 2 1 1 2 1 3 2 1 5 4 3 2 1 1

If you wanted you could reduce it to just the which() statement, in which case the last block of homogenous values would be assigned an NA. Depending on the context there are different ways you might want to treat the last block, as the number of repetitions until the value changes is censored (maybe you want to supply a string in the second term of the ifelse like '1+').

Upvotes: 0

rosscova

Reputation: 5580

How about this, using data.table. There's a bit of reverse ordering, and use of shift to compare values with subsequent values. It might be a little convoluted, but it seems to work.

library( data.table )
dft <- data.table(a)
dft[ , f := shift( a, 1L, fill = F, type = "lead" ) != a
     ][ .N:1, b := seq_len(.N), by = cumsum(f)
     ][ , f := NULL ]
dft

    a b
 1: 0 5
 2: 0 4
 3: 0 3
 4: 0 2
 5: 0 1
 6: 1 1
 7: 0 2
 8: 0 1
 9: 1 3
10: 1 2
11: 1 1
12: 0 5
13: 0 4
14: 0 3
15: 0 2
16: 0 1
17: 1 1

Upvotes: 1

R data.frame: Efficient way to create counter for next change of value in column

Answers (3)

Related Questions