RGR_288
RGR_288

Reputation: 93

how to determine the number of unique values based on multiple criteria dplyr

I've got a df that looks like:

df(site=c(A,B,C,D,E), species=c(1,2,3,4), Year=c(1980:2010).

I would like to calculate the number of different years that each species appear in each site, creating a new column called nYear, I've tried filtering by group and using mutate combined with ndistinct values but it is not quite working.

Here is part of the code I have been using:

Df1 <- Df %>%
  filter(Year>1985)%>%
  mutate(nYear = n_distinct(Year[Year %in% site]))%>%
  group_by(Species,Site, Year) %>% 
  arrange(Species, .by_group=TRUE) 
  ungroup()

Upvotes: 0

Views: 886

Answers (2)

langtang
langtang

Reputation: 24777

You can use distinct() on the filtered frame, and then count by your groups of interest:

distinct(Df %>% filter(Year>1985)) %>%
  count(Site, Species,name = "nYear")

Upvotes: 1

gaut
gaut

Reputation: 5958

The approach is good, a few things to correct.

First, let's make some reproducible data (your code gave errors).

df <- data.frame("site"=LETTERS[1:5], "species"=1:5, "Year"=1981:2010)

You should have used summarise instead of mutate when you're looking to summarise values across groups. It will give you a shortened tibble as an output, with only the groups and the summary figures present (fewer columns and rows).

mutate on the other hand aims to modify an existing tibble, keeping all rows and columns by default.

The order of your functions in the chains also needs to change.

df %>%
  filter(Year>1985) %>%
  group_by(species,site) %>% 
  summarise(nYear = length(unique(Year))) %>% # instead of mutate
  arrange(species, .by_group=TRUE) %>% 
ungroup()

First, group_by(species,site), not year, then summarise and arrange.

# A tibble: 5 × 3
  species site  nYear
    <int> <chr> <int>
1       1 A         5
2       2 B         5
3       3 C         5
4       4 D         5
5       5 E         5

Upvotes: 2

Related Questions