Dimitris K.
Dimitris K.

Reputation: 65

Set values less than threshold to zero, with column-specific thresholds

I have two data frames. One of them contains 165 columns (species names) and almost 193.000 rows which in each cell is a number from 0 to 1 which is the percent possibility of the species to be present in that cell.

 POINTID Abie_Xbor Acer_Camp Acer_Hyrc Acer_Obtu Acer_Pseu Achi_Gran
  2      0.0279037  0.604687 0.0388309 0.0161980 0.0143966  0.240152
  3      0.0294101  0.674846 0.0673055 0.0481405 0.0397423  0.231308
  4      0.0292839  0.603869 0.0597947 0.0526606 0.0463431  0.188875
  6      0.0331264  0.541165 0.0470451 0.0270871 0.0373348  0.256662
  8      0.0393825  0.672371 0.0715808 0.0559353 0.0565391  0.230833
  9      0.0376557  0.663732 0.0747417 0.0445794 0.0602539  0.229265

The second data frame contains 164 columns (species names, as the first data frame) and one row which is the threshold that above this we assume that the species is present and under of this the species is absent

Abie_Xbor Acer_Camp Acer_Hyrc Acer_Obtu Acer_Pseu Achi_Gran Acta_Spic 
 0.3155    0.2816    0.2579    0.2074    0.3007    0.3513    0.3514

What i want to do is to make a new data frame that will contain for every species in the presence possibility (my.data) the number of possibility if it is above the threshold (thres) and if it is under the threshold the zero number.

I know that it would be a for loop and if statement but i am new in R and i don't know for to do this. Please help me.

Upvotes: 0

Views: 3350

Answers (3)

IRTFM
IRTFM

Reputation: 263301

This produces a logical matrix which can be used to generate assignments with "[<-"; (Assuming name of multi-row dataframe is "cols" and named vector is "vec":

sweep(cols[-1], 2, vec, ">") # identifies the items to keep

cols[-1][ sweep(cols[-1], 2, vec, "<") ] <- 0

Your example produced a warning about the mismatch of the number of columns with the length of the vector, but presumably you can adjust the length of the vector to be the correct number of entries.

Upvotes: 0

igelkott
igelkott

Reputation: 1287

It's simpler to have the same number of columns (with the same meanings of course).

frame2 = data.frame(POINTID=0, frame2)

R works with vectors so a row of frame1 can be directly compared to frame2

frame1[,1] < frame2

Could use an explicit loop for every row of frame1 but it's common to use the implicit loop of "apply"

answer = apply(frame1, 1, function(x) x < frame2)

This was all rather sloppy solution (especially changing frame2) but it hopefully demonstrates some basic R. Also, I'd generally prefer arrays and matrices when possible (they can still use labels but are generally faster).

Upvotes: 1

Ben Bolker
Ben Bolker

Reputation: 226087

I think you want something like this:

(Make up small reproducible example)

 set.seed(101)
 speciesdat <- data.frame(pointID=1:10,matrix(runif(100),ncol=10,
                         dimnames=list(NULL,LETTERS[1:10])))
 threshdat <- rbind(seq(0.1,1,by=0.1))

Now process:

 thresh <- unlist(threshdat) ## make data frame into a vector
 ## 'sweep' runs the function column-by-column if MARGIN=2
 ss2 <- sweep(as.matrix(speciesdat[,-1]),MARGIN=2,STATS=thresh,
             FUN=function(x,y) ifelse(x<y,0,x))
 ## recombine results with the first column
 speciesdat2 <- data.frame(pointID=speciesdat$pointID,ss2)

Upvotes: 1

Related Questions