Reputation: 1
I have a large dataset of Bird observations. I would like to count by groups i.e. the species observed by various categories: year, season, and grid.
For example how many American Crows (AMCR) were observed in 2017? Or how many American Robins were observed in 2017 in Breeding season (BB column)?
Here's an example of my headers and first line of data:
Data Headers
Year Season Date Grid Species Count Behavior
2015 BB 22-Jul-15 FF AMCR 1 C
I tried to use the dplyr
count_
and group_by
but I think I'm doing it wrong. Please help!
Upvotes: 0
Views: 2257
Reputation: 5620
Here is other solution using dplyr
. It is similar to the previously suggested; however, I think it might be closer to what you want to do.
To count the number of observed species by year, season and grid:
#Count number of species
df %>%
#Grouping variables
group_by(Year, Season, Grid) %>%
#Remove possible duplicates in the species column
distinct(Species) %>%
#Count number of species
count(name = "SpCount")
To count the number of observed birds by species, year, season and grid:
#Count number of birds per species
df %>%
#Grouping variables
group_by(Species, Year, Season, Grid) %>%
#Count number of birds per species
summarize(BirdCount = sum(Count))
Upvotes: 0
Reputation: 1233
It sounds like you're trying to count the number of observations within group. This is what count
in dplyr is designed for. The trick is that you don't need a group_by
before it.
Here is some example code:
library(dplyr)
data("storms")
count_by_group <- storms %>%
# The variables you want to count observations within
count(year, month, status)
Alternately, if you have a variable called "Count" in your raw data and you want to sum it up within each group, you should instead use summarize
with group_by
sum_by_group <- storms %>%
group_by(year, month, status) %>%
# pressure doesn't make a lot of sense here, but just whatever variable you're trying to sum up
summarize(Count = sum(pressure))
Upvotes: 1