Removing rows of dataframe based on frequency of a variable

Question

I'm working with a dataframe (in R) that contains observations of animals in the wild (recording time/date, location, and species identification). I want to remove rows that contain a certain species if there are less than x observations of them in the whole dataframe. As of now, I managed to get it to work with the following code, but I know there must be a more elegant and efficient way to do it.

namelist <- names(table(ind.data$Species))
for (i in 1:length(namelist)) {
  if (table(ind.data$Species)[namelist[i]] <= 2) {
    while (namelist[i] %in% ind.data$Species) {
      j <- match(namelist[i], ind.data$Species)
      ind.data <- ind.data[-j,]
    }
  }
}

The namelist vector contains all the species names in the data frame ind.data, and the if statement checks to see if the frequency of the ith name on the list is less than x (2 in this example).

I'm fully aware that this is not a very clean way to do it, I just threw it together at the end of the day yesterday to see if it would work. Now I'm looking for a better way to do it, or at least for how I could refine it.

akrun · Accepted Answer

We can use data.table

library(data.table)
setDT(ind.data)[, .SD[.N >2], Species]

Removing rows of dataframe based on frequency of a variable

Answers (2)

Related Questions