folderj
folderj

Reputation: 138

Group by and count unique values in several columns in R

I see a lot of examples of how to count values for one column. I can't find a solution for counting for several columns.

I have data like

city col1 col2 col3 col4 

I want to group by city and count unique values in col1, col2, col3...

aggregate(. ~ city, hh2, function(x) length(unique(x)))

I can count using aggregate, but it replaces city names with numbers and it's unclear how to revert it.

Upvotes: 1

Views: 1619

Answers (2)

Jon Spring
Jon Spring

Reputation: 66490

Here's an approach using dplyr::across, which is a handy way to calculate across multiple columns:

my_data <- data.frame(
  city = c(rep("A", 3), rep("B", 3)),
  col1 = 1:6,
  col2 = 0,
  col3 = c(1:3, 4, 4, 4),
  col4 = 1:2
)

library(dplyr)
my_data %>%
  group_by(city) %>%
  summarize(across(col1:col4, n_distinct))

# A tibble: 2 x 5
  city   col1  col2  col3  col4
* <chr> <int> <int> <int> <int>
1 A         3     1     3     2
2 B         3     1     1     2

Upvotes: 3

Kent Orr
Kent Orr

Reputation: 504

Looks to me like tidy data is what you're after. Here's an example with the tidyverse and subset of the mpg data set in ggplot2.

library(tidyverse)

data <- mpg[c("model", 'cty', 'hwy')]

head(data) #to see the initial data layout.

data %>%
  pivot_longer(cols = c('cty', 'hwy'), names_to = 'cat', values_to = 'values') %>%
  group_by(model, cat) %>%
  summarise(avg = mean(values))

Upvotes: 0

Related Questions