OverFlow Police
OverFlow Police

Reputation: 861

count number of unique elements in a given column after filtering in R

Lets say I have a data table like:

year  city
2026  NYC
2026  NYC
2026  NYC
2026  LA
2027  LA
2028  NYC
2028  NYC

Which can be created by:

dt <- structure(list(location = c("NYC", "NYC", "NYC","LA", "LA", "NYC", "NYC"), 
                     year = c(2026, 2026, 2026, 2026, 2027, 2028, 2028)),
                     class = "data.table", 
                     row.names = c(NA, -7L))

I want to count number of unique cities in a given year. Lets say 2026. So, the outcome in this case would be 2, because there is just NYC and LA. What goes in the last line of following?

dt %>% 
filter(year == 2026) %>%
What goes here?

Upvotes: 0

Views: 47

Answers (2)

akrun
akrun

Reputation: 887501

We can use data.table

library(data.table)
setDT(dt)[year == 2026, .(count = uniqueN(location))]
#   count
#1:     2

Or with `base R

length(table(subset(dt, year == 2026, select = location)))
#[1] 2

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389135

We can use n_distinct to get the number of unique values

library(dplyr)
dt %>%
  filter(year == 2026) %>%
  summarise(count = n_distinct(city)) 

#  count
#1     2

Or add the filtering step in summarise itself

dt %>% summarise(count = n_distinct(city[year == 2026]))

Or if we want it as a vector we can add pull(count)

dt %>%
  filter(year == 2026) %>%
  summarise(count = n_distinct(city)) %>%
  pull(count)
#[1] 2

In base R, this is equivalent to

length(unique(dt$city[dt$year == 2026]))
#[1] 2

Upvotes: 1

Related Questions