Reputation: 675

Counting unique values using variables defined by a secondary data frame

I have two data frames, one data frame states the variables that I need to get unique counts for from the second data frame.

Below is the first data frame with the variables I need to count in the second data frame:

df <- data.frame("ID" = c("Feature1", "Feature3", "Feature5"))
        ID
1 Feature1
2 Feature3
3 Feature5

Below is the second data frame:

 df2 <- data.frame("Feature1" = c(1,3,4), "Feature2" = c(2,6,4), "Feature3" = c(8,4,7), "Feature4" = c(1,8,4), "Feature5" = c(9,9,9))

  Feature1 Feature2 Feature3 Feature4 Feature5
1        1        2        8        1        9
2        3        6        4        8        9
3        4        4        7        4        9

I can get unique counts from the columns of interest in dplyr using the count function and manually enter the variables I need to count (as per below).

df2 %>% count(Feature1) 
  Feature1 n
1        1 1
2        3 1
3        4 1
df2 %>% count(Feature1) 
  Feature1 n
1        1 1
2        3 1
3        4 1
df2 %>% count(Feature3) 
  Feature3 n
1        4 1
2        7 1
3        8 1
df2 %>% count(Feature5)
  Feature5 n
1        9 3

But I would like to have this occur in a more automated way, and if possible would also like to have the results returned in a single data frame, such as below:

  Feature1 n Feature2   n Feature3   n 
1        1 1        2   1        8   3
2        3 1        6   1        4  NA
3        4 1        4   1        7  NA

Upvotes: 0

Answers (3)

Mike V

Reputation: 1364

Another way you can do

library(dplyr)
library(tidyr)
df3 <- df %>% 
  left_join(df2 %>% gather("ID", "value1"), by = "ID") %>%
  group_by(ID, value1) %>% 
  summarise(tol = n())

Upvotes: 1

s_baldur

Reputation: 33488

One base R option:

lapply(
  seq_along(df2), 
  function(i) setNames(data.frame(table(df2[[i]])[1L:n]), paste0(names(df2)[i], c("", "_n")))
) %>% 
  do.call(cbind, .)

#   Feature1 Feature1_n Feature2 Feature2_n Feature3 Feature3_n Feature4 Feature4_n Feature5 Feature5_n
# 1        1          1        2          1        4          1        1          1        9          3
# 2        3          1        4          1        7          1        4          1     <NA>         NA
# 3        4          1        6          1        8          1        8          1     <NA>         NA

Upvotes: 1

Ronak Shah

Reputation: 388982

You could select the columns from df$ID, get the data into long format and count :

library(dplyr)

df2 %>%
  select(all_of(df$ID)) %>%
  tidyr::pivot_longer(cols = everything()) %>%
  count(name, value)

#   name     value     n
#  <chr>    <dbl> <int>
#1 Feature1     1     1
#2 Feature1     3     1
#3 Feature1     4     1
#4 Feature3     4     1
#5 Feature3     7     1
#6 Feature3     8     1
#7 Feature5     9     3

Upvotes: 3

Counting unique values using variables defined by a secondary data frame

Answers (3)

Related Questions