Rory Kirchner
Rory Kirchner

Reputation: 333

summarizing counts of a factor with dplyr

I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation. The real data frame is fairly large, and there are 10 different factors.

Here is some example input:

library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))

  owner  obs1  obs2
1     0 quiet  loud
2     0  loud  loud
3     1 quiet quiet
4     1  loud quiet

I was looking for output that looks like this:

out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))

  owner observation quiet loud
1     0        obs1     1    1
2     0        obs2     0    2
3     1        obs1     1    1
4     1        obs2     2    0

Melting gets me partway there:

melted = tbl_df(melt(df, id=c("owner")))

  owner variable value
1     0     obs1 quiet
2     0     obs1  loud
3     1     obs1 quiet
4     1     obs1  loud
5     0     obs2  loud
6     0     obs2  loud
7     1     obs2 quiet
8     1     obs2 quiet

But what's the last step? If 'value' was a numeric, I'd just go:

melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))

Thanks so much!

Upvotes: 23

Views: 65708

Answers (3)

Sam Clifford
Sam Clifford

Reputation: 436

In 2017 the answer is

library(dplyr)
library(tidyr)

gather(df, key, value, -owner) %>%
  group_by(owner, key, value) %>%
  tally %>% 
  spread(value, n, fill = 0)

Which gives output

Source: local data frame [4 x 4]
Groups: owner, key [4]

  owner   key  loud quiet
* <dbl> <chr> <dbl> <dbl>
1     0  obs1     1     1
2     0  obs2     2     0
3     1  obs1     1     1
4     1  obs2     0     2

In 2019 the answer is:

gather(df, key, value, -owner) %>% 
    count(owner, key, value) %>% 
    spread(value, n, fill = 0)

Upvotes: 32

black_sheep07
black_sheep07

Reputation: 2368

If you wanted to forego the dplyr, you can split into lists.

df <- split(df, list(df[[obs1]], df[[obs2]])

If you wanted the count, you just create an sapply or lapply call to run through the lists and get the count of each one. Or literally any other function you want.

Upvotes: 3

akrun
akrun

Reputation: 887991

You could use tidyr with dplyr

library(dplyr)
library(tidyr)

 df %>%
 gather(observation, Val, obs1:obs2) %>% 
 group_by(owner,observation, Val) %>% 
 summarise(n= n()) %>%
 ungroup() %>%
 spread(Val, n, fill=0)

which gives the output

  #    owner observation loud quiet
  #1     0        obs1    1     1
  #2     0        obs2    2     0
  #3     1        obs1    1     1
  #4     1        obs2    0     2

Upvotes: 29

Related Questions