89_Simple
89_Simple

Reputation: 3805

looping through a dataframe

I have a dataframe with 57 columns and 122 rows. For each column, I want to calculate two things:

1) Number of values less than -1

(2) Number of time value less than -1 appear consecutively three times or more. For e.g. if the data is:

dat<-c(1,-1,-1.3,-1.2,-1,0.5,3.2,2.2,-1,-1,0,-4,-3,-2,-1,2)

For part (1) of the question, I did this:

bd<-sum(dat< -1)
>5

For part (2) it was complicated:

tmpdat<-data.frame(values=dat, tmp_vals=dat)
tmpdat$tmp_vals[tmpdat$values<(-1)]<-"lower"
bds<-data.frame(Values=rle(tmpdat$tmp_vals)$values,Sequential=rle(tmpdat$tmp_vals)$lengths)
sum(bds$Sequential >= 3 & bds$Values == "lower")
>1 

I want to create a loop to do this for each column of my data frame. This is what the loop looks like for (1):

for (i in 1:ncol(d.f)){
        d.f[i]<-sum(d.f.[i]< -1)

}

I want to create within this loop for the second part but do not know how to do it. Thanks for your help.

Upvotes: 2

Views: 337

Answers (1)

akrun
akrun

Reputation: 887831

For the first question, it may be easier to use colSums. We get a logical matrix with df1 < -1 and sum the TRUE values in the matrix with colSums

 colSums(df1< -1, na.rm=TRUE)

na.rm=TRUE can be used as an optional argument in case there are some missing values (NA).

We can loop (vapply) through the columns of the dataset ('df1') and apply rle (modified from @David Arenburg's comments)

  vapply(df1, function(x) 
          sum(with(rle(x < -1), lengths[values]) > 2), numeric(1))

For looping through the columns, we can also use lapply/sapply. Here, I used vapply as it may be a bit more faster and also it is safer (in case there are non-numeric elements, it should give error). In each column, we get the rle of x <-1 i.e. TRUE, FALSE run-lengths and subset the lengths corresponding to TRUE values (lengths[values]), check whether it is greater than 2 (>2) and get the sum.

Upvotes: 1

Related Questions