Tina Van Regenmortel
Tina Van Regenmortel

Reputation: 171

Subsetting a data frame by means of columns contents

I have the following data frame:

Test <- data.frame(Species = c("A","B","C","D"), 
       WB1=c(0.1,1.1,0.9,1.2), 
       WB2=c(1, 0.8, 1.3, 1),
       WB3=c(0.5, 0.7, 1.2, 0.9),
       WB4=c(1.3, 1.2, 0.9, 0.6))

And I would like to get a new data frame per species that only lists the WB's that are lager than one. So in this example for species A that would be

WB1 WB4
1.0 1.3

I have tried the following:

AllSpecies <- Test$Species
AllWaterbodies <- colnames(Test)
for(species in AllSpecies)
{ 
  ind <- which(Test$Species == species)
  x <- Test[ind,]
  colnames(x) <- AllWaterbodies

If say species <- "A", than this would already give me:

  Species WB1 WB2 WB3  NA
1       A 0.1   1 0.5 1.3

now I would like to only list the WB's that are larger than one, and this is where I am stuck. Can any body help me to complete my loop?

Upvotes: 1

Views: 259

Answers (3)

mjv
mjv

Reputation: 75315

Here's a two-liner...
It produces a list named results containing one data.frame per desired species.
Each dataframe is a subset from the corresponding row in Test, the original frame, whereby only the columns that pass the >= 1.0 filter are retained.

results <- list()
for (spc in c('A', 'B', 'C', 'D')) 
   results[[spc]] <- Test[Test$Species==spc, 
                          c( TRUE, Test[Test$Species==spc, -1] >= 1.0)]

> results
$A
  Species WB2 WB4
1       A   1 1.3
$B
  Species WB1 WB4
2       B 1.1 1.2
$C
  Species WB2 WB3
3       C 1.3 1.2
$D
  Species WB1 WB2
4       D 1.2   1

Of course we can use for (spc in Test[, "Species"]) rather than an explicit list, when we wish to take all species.
Also, the snippet can be tweaked to have fancier names for the list's elements and/or to exclude the Species column from the individual data.frames. eg.

> results <- list()
> for (spc in c('A', 'C')) 
     results[[paste("Record for Species", spc)]] <-
            Test[Test$Species==spc, 
                 c(FALSE , Test[Test$Species==spc, -1] >= 1.0)]
> results
$`Record for Species A`
  WB2 WB4
1   1 1.3

$`Record for Species C`
  WB2 WB3
3 1.3 1.2

Upvotes: 1

Jilber Urbina
Jilber Urbina

Reputation: 61214

An R base solution using lapply:

 lapply(split(Test[,-1], Test$Species), function(x) x[which(x>1)])
$A
  WB4
1 1.3

$B
  WB1 WB4
2 1.1 1.2

$C
  WB2 WB3
3 1.3 1.2

$D
  WB1
4 1.2

The same result as @Beasterfield's but no need to install an extra package.

You're asking for values larger than one, but in your desired output you show values larger or equal to 1, so maybe the code you're looking for is the following:

lapply(split(Test[,-1], Test$Species), function(x) x[which(x>=1)])
$A
  WB2 WB4
1   1 1.3

$B
  WB1 WB4
2 1.1 1.2

$C
  WB2 WB3
3 1.3 1.2

$D
  WB1 WB2
4 1.2   1

Upvotes: 3

Beasterfield
Beasterfield

Reputation: 7123

Is it that what you want?

library("plyr")
dlply( Test, "Species", function(x){
  x[ ,c( F, x[,2:5] > 1), drop = FALSE ]
})

Output:

$A
  WB4
1 1.3

$B
  WB1 WB4
2 1.1 1.2

$C
  WB2 WB3
3 1.3 1.2

$D
  WB1
4 1.2

Upvotes: 2

Related Questions