user1039698
user1039698

Reputation: 173

Summarizing with condition in R dplyr

I have a dataset with persons' spent time for various projects by month and category, smth like this:

person | project | date | time 
--------------------------------
  A   |    a     |  Jan  |  1
  A   |    b     |  Jan  |  2
  A   |    c     |  Jan  |  3
  A   |    d     |  Feb  |  1
  B   |    a     |  Feb  |  2
  B   |    b     |  Feb  |  3
  B   |    c     |  Feb  |  1
--------------------------------

I need to have a summary by person by date with total time spent and part of the time spent on one of projects (let's say "a"), i.e.:

person |   date     |  Total | project:a 
--------------------------------
  A    |    Jan     |  6     |  1
  A    |    Feb     |  1     |  0
  B    |    Jan     |  0     |  0
  B    |    Feb     |  6     |  2
--------------------------------

I have a small code that I found in different similar questions, but that don't give correct results:

data %>% group_by(person, date) %>% summarise(total = sum(time), `project:a` = sum(time[project == "a"]))

It calculates correctly the total sum, but not the sum with condition - it mostly returns NA. What can be the issue? Thanks.

Upvotes: 0

Views: 864

Answers (2)

akrun
akrun

Reputation: 887901

We can use type_convert from readr

 library(dplyr)
 library(readr)
 df %>%
   type_convert %>%
    group_by(person, date) %>%
    summarise(Total = sum(time), project_a = sum(time[project == "a"]))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389275

Try using type.convert if you have factor columns.

df %>% 
  type.convert %>% 
  group_by(person, date, .drop = FALSE) %>% 
  summarise(Total = sum(time), project_a = sum(time[project == "a"]))

#  person date  Total project_a
#  <fct>  <fct> <int>     <int>
#1 A      Feb       1         0
#2 A      Jan       6         1
#3 B      Feb       6         2
#4 B      Jan       0         0

Upvotes: 3

Related Questions