Runner Bean
Runner Bean

Reputation: 5155

R count values in column where another column is unique

I have a data frame 'df' like

      user_id     value
1      t34        A
2      t34        A
3      t87        A
4      t55        B
5      t55        B
6      t76        B
7      t99        A

if i do

table(df$value)

I get

 A    B
 4    3

but I only want to count once per user, something like

table(df$value WHERE user_id IS unique)

and get

 A    B
 3    2

How can this be done?

Upvotes: 3

Views: 2501

Answers (3)

www
www

Reputation: 39154

The solution submitted are all perfect. Here I would like to provide an approach using the duplicated function in base R. duplicated function can determine which rows are duplicates of other rows in a data frame. It can take a data frame as the input argument and return a logical vector.

# Create the example data frame
df <- data.frame(user_id = c("t34", "t34", "t87", "t55", "t55", "t76", "t99"),
                 value = c("A", "A", "A", "B", "B", "B", "A"),
                 stringsAsFactors = FALSE)

# Use the duplicate function
uniq_df <- df[!duplicated(df), ]
# Count number of value in uniq_df
table(uniq_df$value)

A B 
3 2

Upvotes: 1

akrun
akrun

Reputation: 886938

Here is an option using data.table

library(data.table)
unique(setDT(df))[, .N, value]
#   value N
#1:     A 3
#2:     B 2

Or with tidyverse

library(dplyr)
distinct(df) %>% 
           count(value)
# A tibble: 2 × 2
#  value     n
#   <chr> <int>
#1     A     3
#2     B     2

Upvotes: 3

Rich Scriven
Rich Scriven

Reputation: 99321

You can take the unique rows first, then tabulate.

table(unique(df)$value)
#
# A B 
# 3 2 

If you have other columns, then you can take a column subset first, then tabulate.

with(unique(df[c("user_id", "value")]), table(value))
# value
# A B 
# 3 2 

Upvotes: 6

Related Questions