Reputation: 45
I am trying to sample 50 Taxon at random for a new dataframe from my original dataframe that contains 100 Taxon. For the 50 taxon randomly selected I want to keep information for all 4 columns. A subset of my original dataframe (high.diversity) looks like this:
Taxon C N func.group
1 Curculionidae.Ischapterapion.sp. -29.06 2.19 herbivore
2 Curculionidae.Ischapterapion.sp. -29.27 1.60 herbivore
3 Curculionidae.Protapion.sp. -28.45 1.91 herbivore
4 Curculionidae.Protapion.sp. -25.99 0.55 herbivore
5 Curculionidae.Protapion.sp. -28.27 1.52 herbivore
6 Curculionidae.Hypera.meles -25.41 3.38 herbivore
7 Curculionidae.Sitona.sp. -27.05 2.01 herbivore
8 Curculionidae.Sitona.sp. -26.70 3.07 herbivore
.....
230
For each of my Taxon I have between 1-5 replicates, so that I have 100 taxon but 230 data points. (e.g. Curculionidae.Ischapterapion.sp. has 2 replicates in the above table).
I have successfully sampled 50 rows at random using the following code:
new.df<-high.diversity[sample(nrow(high.diversity),50),]
However, my problem is that the above code gives 50 rows, but what I actually want is to select 50 Taxon at random, and have all replicates for each of those Taxon. (i.e. 50 Taxon each with multiple replicates might give nearer to 100 rows). Therefore I need to change the above code to select 50 random Taxon and include all replicates within those Taxon.
Could anyone suggest how I might achieve this?
Thanks very much,
M
Upvotes: 2
Views: 2493
Reputation: 13310
Sample from your Taxons and the subset your data.frame to these taxons:
df <- read.table(header = TRUE, stringsAsFactors=FALSE, text = ' Taxon C N func.group
1 Curculionidae.Ischapterapion.sp. -29.06 2.19 herbivore
2 Curculionidae.Ischapterapion.sp. -29.27 1.60 herbivore
3 Curculionidae.Protapion.sp. -28.45 1.91 herbivore
4 Curculionidae.Protapion.sp. -25.99 0.55 herbivore
5 Curculionidae.Protapion.sp. -28.27 1.52 herbivore
6 Curculionidae.Hypera.meles -25.41 3.38 herbivore
7 Curculionidae.Sitona.sp. -27.05 2.01 herbivore
8 Curculionidae.Sitona.sp. -26.70 3.07 herbivore')
set.seed(1234)
take <- sample(unique(df$Taxon), 2)
df[df$Taxon %in% take, ]
Taxon C N func.group
1 Curculionidae.Ischapterapion.sp. -29.06 2.19 herbivore
2 Curculionidae.Ischapterapion.sp. -29.27 1.60 herbivore
3 Curculionidae.Protapion.sp. -28.45 1.91 herbivore
4 Curculionidae.Protapion.sp. -25.99 0.55 herbivore
5 Curculionidae.Protapion.sp. -28.27 1.52 herbivore
Upvotes: 2