ununplugged
ununplugged

Reputation: 11

Creating monthly total counts and monthly counts filtered by entry type in dplyr

I've been stuck on this for a couple sittings now and figured it might be more time efficient to ask. The answer is likely something relatively simple I'm not aware of as this seems like a common task, though I could find no similar online examples.

On a list of activities people have done, I am trying to get a condensed activity count list, as group_by(Person) %>% summarise() would be used for, for each person. I am trying to achieve the following end result:

1 person per row

Columns (for each person):

-Total activities

-Total for each type

-Total for each subtype (tallying combinations of types + subtypes doesn't matter fyi)

-Total count for each month (so each month is a column)

-Total type count for each month (12 mos X 2 Types = 24 more columns)

-total subtype count for each month (12 mos X 3 subtypes = 36 more columns)

If there is no count for a month (as there isn't for July-December in the example), it needs to read 0 ie the month isn't excluded.

activities <- data.frame(
  Person = c(rep("Person A", 3), rep("Person B", 3)), 
  Month = seq(as.Date("2020/1/1"), by = "month", length.out = 6),
  Activity.Type = as.factor(seq(1:2)),
  Activity.Sub.Type = as.factor(seq(1:3))
  )
activities

All of the month columns may seem odd, but I'm adding this data to an annual productivity metric for each person for a regression model. This seemed like the best way to structure everything for that use case.

Let me know if you have any questions and thanks for your help.

Upvotes: 1

Views: 137

Answers (1)

TarJae
TarJae

Reputation: 78927

I am not sure. But maybe you are looking for something like this:

library(dplyr)
library(tidyr)

df_month <- activities %>% 
    group_by(Person, Month) %>% 
    add_count() %>% 
    add_count(Activity.Type) %>% 
    add_count(Activity.Sub.Type) %>% 
    summarise(Total_activites = sum(n), Activity.Type=sum(nn), Activity.Sub.Type=sum(nnn)) %>% 
    pivot_wider(
        names_from = Month, 
        values_from = Total_activites:Activity.Sub.Type
    )

activities %>% 
    group_by(Person) %>% 
    add_count() %>% 
    add_count(Activity.Type) %>% 
    add_count(Activity.Sub.Type) %>% 
    summarise(Total_activites = sum(n), Activity.Type=sum(nn), Activity.Sub.Type=sum(nnn)) %>% 
    right_join(df_month, by="Person")

Person   Total_activites Activity.Type Activity.Sub.Type `Total_activite~ `Total_activite~ `Total_activite~ `Total_activite~ `Total_activite~ `Total_activite~
  <chr>              <int>         <int>             <int>            <int>            <int>            <int>            <int>            <int>            <int>
1 Person A               9             5                 3                1                1                1               NA               NA               NA
2 Person B               9             5                 3               NA               NA               NA                1                1                1
# ... with 12 more variables: Activity.Type_2020-01-01 <int>, Activity.Type_2020-02-01 <int>, Activity.Type_2020-03-01 <int>, Activity.Type_2020-04-01 <int>,
#   Activity.Type_2020-05-01 <int>, Activity.Type_2020-06-01 <int>, Activity.Sub.Type_2020-01-01 <int>, Activity.Sub.Type_2020-02-01 <int>,
#   Activity.Sub.Type_2020-03-01 <int>, Activity.Sub.Type_2020-04-01 <int>, Activity.Sub.Type_2020-05-01 <int>, Activity.Sub.Type_2020-06-01 <int>

Upvotes: 1

Related Questions