Reputation: 25
vector A:
a = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1)
vector B: (only used for initialization)
b = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Dataframe:
dft <- data.frame(a,b)
The following for-loop compares for each row "i" the value A[i] with A[i+1] in vector A. If i+1 is different -> write "count" else check i+2 and increment "count" ...
The idea is to know for each row, the number of rows until the value in A changes.
count = 0
% takes endless (for large set) but does its job
for(i in 1:nrow(dft)) {
for(j in i+1:nrow(dft)-1) {
j_value <- dft[j,"a"]
i_value <- dft[i,"a"]
if (!is.na(j_value) & !is.na(i_value)){
tmp_value <- abs(i_value - j_value)
if(tmp_value > 0) {
dft[i,"b"] <- count
count = 0
break
} else {
count = count + 1
}
}
}
}
Results should be:
b
1: 5
2: 4
3: 3
4: 2
5: 1
6: 1
7: 2
8: 1
9: 3
10: 2
11: 1
12: 5
13: 4
14: 3
15: 2
16: 1
17: 0
Upvotes: 1
Views: 172
Reputation: 193517
The following should work:
b = rle(a)
unlist(mapply(":", b$lengths, 1))
# [1] 5 4 3 2 1 1 2 1 3 2 1 5 4 3 2 1 1
Or in one line:
with(rle(a), unlist(Map(":", lengths, 1)))
Using "data.table", you can do the following:
library(data.table)
data.table(a)[, b := .N:1, rleid(a)][]
# a b
# 1: 0 5
# 2: 0 4
# 3: 0 3
# 4: 0 2
# 5: 0 1
# 6: 1 1
# 7: 0 2
# 8: 0 1
# 9: 1 3
# 10: 1 2
# 11: 1 1
# 12: 0 5
# 13: 0 4
# 14: 0 3
# 15: 0 2
# 16: 0 1
# 17: 1 1
Upvotes: 2
Reputation: 3647
Here is another approach with apply:
# The data
a=c(0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1)
# An index of the data
ind <- 1:length(a)
# The function to apply
f <- function(x){ifelse(!is.na(which(a[x]!=a[x:length(a)])[1] - 1), # Check if we are in the last group before series ends
which(a[x]!=a[x:length(a)])[1] - 1, # if not return distance to nearest value change
ind[length(a)] - x + 1) # if we are return length of last block of values
}
unlist(lapply(ind, f)) # Apply and unlist to vector
#> [1] 5 4 3 2 1 1 2 1 3 2 1 5 4 3 2 1 1
If you wanted you could reduce it to just the which()
statement, in which case the last block of homogenous values would be assigned an NA. Depending on the context there are different ways you might want to treat the last block, as the number of repetitions until the value changes is censored (maybe you want to supply a string in the second term of the ifelse like '1+').
Upvotes: 0
Reputation: 5580
How about this, using data.table
. There's a bit of reverse ordering, and use of shift
to compare values with subsequent values. It might be a little convoluted, but it seems to work.
library( data.table )
dft <- data.table(a)
dft[ , f := shift( a, 1L, fill = F, type = "lead" ) != a
][ .N:1, b := seq_len(.N), by = cumsum(f)
][ , f := NULL ]
dft
a b
1: 0 5
2: 0 4
3: 0 3
4: 0 2
5: 0 1
6: 1 1
7: 0 2
8: 0 1
9: 1 3
10: 1 2
11: 1 1
12: 0 5
13: 0 4
14: 0 3
15: 0 2
16: 0 1
17: 1 1
Upvotes: 1