Ângelo
Ângelo

Reputation: 169

logical value count with summarise r

In a data frame, I have a column with Y and N values. This data frame also has an id column. I would like to create two columns, one with the total Y count and another with the total N count for each id. I tried doing this procedure with the dplyr summarise function

 group_by(id) %>%
 summarise(total_not = count(column_y_e_n == "N"),
           total_yes = count(column_y_e_n == "Y")

but objected to the error message

Error in summarise_impl(.data, dots)

Any sugestion?

Upvotes: 1

Views: 3668

Answers (4)

davsjob
davsjob

Reputation: 1960

I usually want to do everything in tidyverse. But in this case the base R solution seems appropriate:

dfr <- data.frame(
  id =  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
  column_y_e_n = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)

table(dfr)

gives you:

   column_y_e_n
id  N Y
  1 1 4
  2 3 2
  3 3 0

Upvotes: 0

&#194;ngelo
&#194;ngelo

Reputation: 169

I replaced the count function with the sum function and got success.

 group_by(id) %>%
 summarise(total_not = sum(column_y_e_n == "N"),
           total_yes = sum(column_y_e_n == "Y")

Upvotes: 3

Kris Williams
Kris Williams

Reputation: 122

Slight variation on original answer from Harro:

library(tidyr)

dfr <- data.frame(
  id =  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3),
  bool = c("Y", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "Y", "N", "N", "N")
)

dfrSummary <- dfr %>% 
  group_by(
    id, bool
    ) %>% 
  summarize(
    count = n()
    ) %>% 
  spread(
    key = bool, 
    value = count, 
    fill = 0
    )

Upvotes: 0

Henry Cyranka
Henry Cyranka

Reputation: 3060

I would approach the problem using group_by and tally(). Or you can skip the middle step and use count directly.

library(tidyverse)

##Fake data
df <- tibble(
    id = rep(1:20,each = 10),
    column_y_e_n = sapply(1:200, function(i)sample(c("Y", "N"),1))
)

##group_by() + tally()
df_2 <- df %>%
    group_by(id, column_y_e_n) %>%
    tally() %>%
    spread(column_y_e_n, n) %>%
    magrittr::set_colnames(c("id", "total_not", "total_yes"))


df_2

#direct method
df_3 <- df %>%
    count(id, column_y_e_n) %>%
    spread(column_y_e_n, n) %>%
    magrittr::set_colnames(c("id", "total_not", "total_yes"))

df_3

The last pipes spread the resulting column and format column names.

Upvotes: 0

Related Questions