Reputation: 335
I have a dataframe test that looks like this:
ID Month1 Month2 Month3 Month4 Month5 Month6 Month7 Month8
A 0 1 1 1 1 0 0 0
B 0 1 1 1 0 0 NA NA
C 0 0 0 0 NA NA NA NA
D 0 1 1 1 1 1 1 1
E 0 1 1 0 0 0 1 1
F 0 0 0 1 1 1 1 1
G 0 0 1 1 0 0 0 0
H 0 0 0 0 0 0 0 0
I want to create a conditional variable “Result” within the data frame that counts the number of times 1 is repeated in a row. However, I only want to count the first set of 1s that is followed by a 0, which means that there are three conditions that I need to keep in mind when crating the “Result” variable. The conditions are as follows:
Condition 1: Count the first set of 1s that end with a 0.
Condition 2: If the set of 1s does not end in 0 or ends in NA, Result=NA
Condition 3 If there are two sets of 1s that end in the zero, ignore the second one.
ID Month1 Month2 Month3 Month4 Month5 Month6 Month7 Month8 Result
A 0 1 1 1 1 0 0 0 4
B 0 1 1 1 0 0 NA NA 3
C 0 0 0 0 NA NA NA NA 0
D 0 1 1 1 1 1 1 1 NA
E 0 1 1 0 0 0 1 1 2
F 0 0 0 1 1 1 1 1 NA
G 0 0 1 1 0 0 0 0 2
H 0 0 0 0 0 0 0 0 0
This that I have tried so far:
test$Result=apply(test[,c(2:9)],1,function(x) length(x[x==1&!(is.na(x))]))
this counts the total number of 1 in the row and not just the first set of ones.
Any and all help would be much appreciated.
Upvotes: 2
Views: 315
Reputation: 17412
This should work although there may be a tidier solution:
FindResult = function(x)
{
RLE = rle(x)
i = which(RLE$values == 1)[1]
if(is.na(i))
return(0)
if (length(RLE$values) > i & RLE$values[i + 1] == 0)
return(RLE$lengths[i])
else
return(NA)
}
Result = apply(df[,-1], 1, FindResult)
> Result
[1] 4 3 0 NA 2 NA 2 0
Upvotes: 1
Reputation: 206232
I think using rle()
to calculate `lengths of runs of values would be helpful here. Here's how i'm assiming your data is stored in R
dd <- structure(list(ID = structure(1:8, .Label = c("A", "B", "C",
"D", "E", "F", "G", "H"), class = "factor"), Month1 = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Month2 = c(1L, 1L, 0L, 1L, 1L, 0L, 0L,
0L), Month3 = c(1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L), Month4 = c(1L,
1L, 0L, 1L, 0L, 1L, 1L, 0L), Month5 = c(1L, 0L, NA, 1L, 0L, 1L,
0L, 0L), Month6 = c(0L, 0L, NA, 1L, 0L, 1L, 0L, 0L), Month7 = c(0L,
NA, NA, 1L, 1L, 1L, 0L, 0L), Month8 = c(0L, NA, NA, 1L, 1L, 1L,
0L, 0L)), .Names = c("ID", "Month1", "Month2", "Month3", "Month4",
"Month5", "Month6", "Month7", "Month8"), class = "data.frame",
row.names = c(NA, -8L))
Then I would run code like
dd$Result <- apply(dd[, -1], 1, function(x) {
rr<-rle(x)
f1<-which(rr$values==1)[1]
if ( is.na(f1) ) {
0
} else if (!is.na(rr$values[f1+1]) && rr$values[f1+1]==0) {
rr$lengths[f1]
} else {
NA
}
})
which gives
ID Month1 Month2 Month3 Month4 Month5 Month6 Month7 Month8 Result
1 A 0 1 1 1 1 0 0 0 4
2 B 0 1 1 1 0 0 NA NA 3
3 C 0 0 0 0 NA NA NA NA 0
4 D 0 1 1 1 1 1 1 1 NA
5 E 0 1 1 0 0 0 1 1 2
6 F 0 0 0 1 1 1 1 1 NA
7 G 0 0 1 1 0 0 0 0 2
8 H 0 0 0 0 0 0 0 0 0
as you desired.
You do have a few edges cases that made this code somewhat tricky. First we have to see of there are any 1's; if not, we return 0. Then if there is a one, we want to make sure it's followed by a 0, otherwise return NA
. We perform this logic for each row with an apply()
, leaving off the ID column. It's not entirely clear from your description what should happen with NA
values if they appear before the end of the row (like at the beginning) so i'll just assume that's a case you don't have to worry about.
Upvotes: 3