Reputation: 3805
I have a dataframe with 57 columns and 122 rows. For each column, I want to calculate two things:
1) Number of values less than -1
(2) Number of time value less than -1 appear consecutively three times or more. For e.g. if the data is:
dat<-c(1,-1,-1.3,-1.2,-1,0.5,3.2,2.2,-1,-1,0,-4,-3,-2,-1,2)
For part (1) of the question, I did this:
bd<-sum(dat< -1)
>5
For part (2) it was complicated:
tmpdat<-data.frame(values=dat, tmp_vals=dat)
tmpdat$tmp_vals[tmpdat$values<(-1)]<-"lower"
bds<-data.frame(Values=rle(tmpdat$tmp_vals)$values,Sequential=rle(tmpdat$tmp_vals)$lengths)
sum(bds$Sequential >= 3 & bds$Values == "lower")
>1
I want to create a loop to do this for each column of my data frame. This is what the loop looks like for (1):
for (i in 1:ncol(d.f)){
d.f[i]<-sum(d.f.[i]< -1)
}
I want to create within this loop for the second part but do not know how to do it. Thanks for your help.
Upvotes: 2
Views: 337
Reputation: 887831
For the first question, it may be easier to use colSums
. We get a logical matrix with df1 < -1
and sum
the TRUE
values in the matrix with colSums
colSums(df1< -1, na.rm=TRUE)
na.rm=TRUE
can be used as an optional argument in case there are some missing values (NA
).
We can loop (vapply
) through the columns of the dataset ('df1') and apply rle
(modified from @David Arenburg's comments)
vapply(df1, function(x)
sum(with(rle(x < -1), lengths[values]) > 2), numeric(1))
For looping through the columns, we can also use lapply/sapply
. Here, I used vapply
as it may be a bit more faster and also it is safer (in case there are non-numeric elements, it should give error). In each column, we get the rle
of x <-1
i.e. TRUE, FALSE
run-lengths and subset the lengths
corresponding to TRUE values (lengths[values]
), check whether it is greater than 2 (>2
) and get the sum
.
Upvotes: 1