Reputation: 73
I'm working with a dataframe (in R) that contains observations of animals in the wild (recording time/date, location, and species identification). I want to remove rows that contain a certain species if there are less than x observations of them in the whole dataframe. As of now, I managed to get it to work with the following code, but I know there must be a more elegant and efficient way to do it.
namelist <- names(table(ind.data$Species))
for (i in 1:length(namelist)) {
if (table(ind.data$Species)[namelist[i]] <= 2) {
while (namelist[i] %in% ind.data$Species) {
j <- match(namelist[i], ind.data$Species)
ind.data <- ind.data[-j,]
}
}
}
The namelist
vector contains all the species names in the data frame ind.data
, and the if
statement checks to see if the frequency of the i
th name on the list is less than x (2
in this example).
I'm fully aware that this is not a very clean way to do it, I just threw it together at the end of the day yesterday to see if it would work. Now I'm looking for a better way to do it, or at least for how I could refine it.
Upvotes: 1
Views: 1767
Reputation: 887881
We can use data.table
library(data.table)
setDT(ind.data)[, .SD[.N >2], Species]
Upvotes: 0
Reputation: 78630
You can do this with the dplyr package:
library(dplyr)
new.ind.data <- ind.data %>%
group_by(Species) %>%
filter(n() > 2) %>%
ungroup()
An alternative using built-in functions is to use ave()
:
group_sizes <- ave(ind.data$Species, ind.data$Species, FUN = length)
new.ind.data <- ind.data[group_sizes > 2, ]
Upvotes: 1