user2890989
user2890989

Reputation: 45

random sampling from dataframe using column

I am trying to sample 50 Taxon at random for a new dataframe from my original dataframe that contains 100 Taxon. For the 50 taxon randomly selected I want to keep information for all 4 columns. A subset of my original dataframe (high.diversity) looks like this:

                           Taxon              C     N    func.group
1         Curculionidae.Ischapterapion.sp. -29.06  2.19  herbivore
2         Curculionidae.Ischapterapion.sp. -29.27  1.60  herbivore
3              Curculionidae.Protapion.sp. -28.45  1.91  herbivore
4              Curculionidae.Protapion.sp. -25.99  0.55  herbivore
5              Curculionidae.Protapion.sp. -28.27  1.52  herbivore
6              Curculionidae.Hypera.meles  -25.41  3.38  herbivore
7                Curculionidae.Sitona.sp.  -27.05  2.01  herbivore
8                Curculionidae.Sitona.sp.  -26.70  3.07  herbivore
.....
230

For each of my Taxon I have between 1-5 replicates, so that I have 100 taxon but 230 data points. (e.g. Curculionidae.Ischapterapion.sp. has 2 replicates in the above table).

I have successfully sampled 50 rows at random using the following code:

new.df<-high.diversity[sample(nrow(high.diversity),50),]

However, my problem is that the above code gives 50 rows, but what I actually want is to select 50 Taxon at random, and have all replicates for each of those Taxon. (i.e. 50 Taxon each with multiple replicates might give nearer to 100 rows). Therefore I need to change the above code to select 50 random Taxon and include all replicates within those Taxon.

Could anyone suggest how I might achieve this?

Thanks very much,

M

Upvotes: 2

Views: 2493

Answers (1)

EDi
EDi

Reputation: 13310

Sample from your Taxons and the subset your data.frame to these taxons:

df <- read.table(header = TRUE, stringsAsFactors=FALSE, text = '                          Taxon              C     N    func.group
1         Curculionidae.Ischapterapion.sp. -29.06  2.19  herbivore
2         Curculionidae.Ischapterapion.sp. -29.27  1.60  herbivore
3              Curculionidae.Protapion.sp. -28.45  1.91  herbivore
4              Curculionidae.Protapion.sp. -25.99  0.55  herbivore
5              Curculionidae.Protapion.sp. -28.27  1.52  herbivore
6              Curculionidae.Hypera.meles  -25.41  3.38  herbivore
7                Curculionidae.Sitona.sp.  -27.05  2.01  herbivore
8                Curculionidae.Sitona.sp.  -26.70  3.07  herbivore')

set.seed(1234)
take <- sample(unique(df$Taxon), 2)
df[df$Taxon %in% take, ]
                             Taxon      C    N func.group
1 Curculionidae.Ischapterapion.sp. -29.06 2.19  herbivore
2 Curculionidae.Ischapterapion.sp. -29.27 1.60  herbivore
3      Curculionidae.Protapion.sp. -28.45 1.91  herbivore
4      Curculionidae.Protapion.sp. -25.99 0.55  herbivore
5      Curculionidae.Protapion.sp. -28.27 1.52  herbivore

Upvotes: 2

Related Questions