Reputation: 11
I've been stuck on this for a couple sittings now and figured it might be more time efficient to ask. The answer is likely something relatively simple I'm not aware of as this seems like a common task, though I could find no similar online examples.
On a list of activities people have done, I am trying to get a condensed activity count list, as group_by(Person) %>% summarise() would be used for, for each person. I am trying to achieve the following end result:
1 person per row
Columns (for each person):
-Total activities
-Total for each type
-Total for each subtype (tallying combinations of types + subtypes doesn't matter fyi)
-Total count for each month (so each month is a column)
-Total type count for each month (12 mos X 2 Types = 24 more columns)
-total subtype count for each month (12 mos X 3 subtypes = 36 more columns)
If there is no count for a month (as there isn't for July-December in the example), it needs to read 0 ie the month isn't excluded.
activities <- data.frame(
Person = c(rep("Person A", 3), rep("Person B", 3)),
Month = seq(as.Date("2020/1/1"), by = "month", length.out = 6),
Activity.Type = as.factor(seq(1:2)),
Activity.Sub.Type = as.factor(seq(1:3))
)
activities
All of the month columns may seem odd, but I'm adding this data to an annual productivity metric for each person for a regression model. This seemed like the best way to structure everything for that use case.
Let me know if you have any questions and thanks for your help.
Upvotes: 1
Views: 137
Reputation: 78927
I am not sure. But maybe you are looking for something like this:
library(dplyr)
library(tidyr)
df_month <- activities %>%
group_by(Person, Month) %>%
add_count() %>%
add_count(Activity.Type) %>%
add_count(Activity.Sub.Type) %>%
summarise(Total_activites = sum(n), Activity.Type=sum(nn), Activity.Sub.Type=sum(nnn)) %>%
pivot_wider(
names_from = Month,
values_from = Total_activites:Activity.Sub.Type
)
activities %>%
group_by(Person) %>%
add_count() %>%
add_count(Activity.Type) %>%
add_count(Activity.Sub.Type) %>%
summarise(Total_activites = sum(n), Activity.Type=sum(nn), Activity.Sub.Type=sum(nnn)) %>%
right_join(df_month, by="Person")
Person Total_activites Activity.Type Activity.Sub.Type `Total_activite~ `Total_activite~ `Total_activite~ `Total_activite~ `Total_activite~ `Total_activite~
<chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 Person A 9 5 3 1 1 1 NA NA NA
2 Person B 9 5 3 NA NA NA 1 1 1
# ... with 12 more variables: Activity.Type_2020-01-01 <int>, Activity.Type_2020-02-01 <int>, Activity.Type_2020-03-01 <int>, Activity.Type_2020-04-01 <int>,
# Activity.Type_2020-05-01 <int>, Activity.Type_2020-06-01 <int>, Activity.Sub.Type_2020-01-01 <int>, Activity.Sub.Type_2020-02-01 <int>,
# Activity.Sub.Type_2020-03-01 <int>, Activity.Sub.Type_2020-04-01 <int>, Activity.Sub.Type_2020-05-01 <int>, Activity.Sub.Type_2020-06-01 <int>
Upvotes: 1