datazang
datazang

Reputation: 1169

Count distinct among the rows and aggregate

I have a data set as shown below:

data <- tribble(
  ~top_1, ~top_2, ~top_3,
  "A",     "B",    "C",
  "B",     "B",    "B",   
  "C",     "B",    "C",
  "A",     "B",    "B",
  "A",     "A",    "A",
  "B",     "B",    "A",
  "C",     "A",    "C",
  "A",     "A",    "A",
  "A",     "C",    "B",
  "B",     "B",    "C",
)

And now, I want to count distinct the rows for each column and have a new data set something like this:

new_data <- tribble(
  ~product, ~top_1, ~top_2, ~top_3,
     "A",    .50,    .30,     .30,
     "B",    .30,    .60,     .30,
     "C",    .20,    .10,     .40,
)

Could you please help me to be able to create this data?

Upvotes: 2

Views: 99

Answers (3)

akrun
akrun

Reputation: 886938

Here is one option where we gather into 'long' format, get the count and reshape to 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
data %>%
   gather %>% 
   group_by_all %>% 
   count %>%
   group_by(key) %>%
   mutate(n = n/sum(n)) %>% 
   pivot_wider( names_from = key, values_from = n)
# A tibble: 3 x 4
# Groups:   value [3]
#  value top_1 top_2 top_3
#  <chr> <dbl> <dbl> <dbl>
#1 A       0.5   0.3   0.3
#2 B       0.3   0.6   0.3
#3 C       0.2   0.1   0.4

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 39858

One base R option could be:

table(stack(data))/nrow(data)

values top_1 top_2 top_3
     A   0.5   0.3   0.3
     B   0.3   0.6   0.3
     C   0.2   0.1   0.4

And if you want it as a data.frame:

as.data.frame.matrix(table(stack(data))/nrow(data))

Upvotes: 3

d.b
d.b

Reputation: 32538

lvl = unique(unlist(data))
sapply(data, function(x) prop.table(table(factor(x, lvl))))
#  top_1 top_2 top_3
#A   0.5   0.3   0.3
#B   0.3   0.6   0.3
#C   0.2   0.1   0.4

Upvotes: 4

Related Questions