Grouping by multiple factors and summarizing counts of factors

Question

I have a bunch of categorical ship "Type" data, e.g. passenger, fishing, cargo etc. within different distances offshore (DOS, e.g 0-12 nm, 0-25 nm etc.) for different months of the year.

Initially I want to get a count of the number of Type, e.g. passenger, ships for each DOS for the whole year/data set. I then want to do the same for each month of the year.

I guess this would be some kind of group_by function followed by summaries?? but I am having a hart time trying to get the output as I am not that great using dplyr yet.

some things I have tried:

ships <- df %>% group_by(DOS, Type)
shipc <- summarize(ships, count = n())

df1 <- gather(df, Type, DOS) %>% count(Type, DOS) %>% spread(DOS, n, fill = 0)

But I am pretty sure it is not working, because I do not understand the syntax properly....

Here is some dummy data:

df <- structure(list(Type = c("Cargo ship", "Cargo ship", "Cargo ship", 
"Cargo ship", "Cargo ship", "Cargo ship", "Fishing", "Fishing", 
 "Fishing", "Fishing", "Fishing", "Cargo ship", "Cargo ship", 
 "Cargo ship", "Cargo ship", "Cargo ship", "Fishing", "Fishing", 
"Fishing", "Fishing", "Fishing", "Fishing", "Fishing", "Fishing", 
"Fishing", "Cargo ship:DG,HS,MP(A)", "Cargo ship", "Cargo ship", 
"Fishing", "Fishing", "Fishing", "Fishing", "Fishing", "Tanker", 
 "Cargo ship", "Cargo ship", "Fishing", "Fishing", "Cargo 
 ship:DG,HS,MP(A)", 
 "Cargo ship:DG,HS,MP(D)", "Cargo ship:DG,HS,MP(D)", "Cargo 
 ship:DG,HS,MP(D)", 
 "Cargo ship"), DOS = c("0-100", "0-50", "0-25", "0-100", "0-50", 
 "0-25", "0-100", "0-25", "0-12", "0-50", "0-100", "0-50", "0-100", 
 "0-25", "0-50", "0-100", "0-50", "0-25", "0-50", "0-100", "0-25", 
 "0-100", "0-100", "0-50", "0-25", "0-100", "0-100", "0-50", "0-100", 
 "0-50", "0-25", "0-100", "0-100", "0-100", "0-50", "0-100", "0-100", 
 "0-100", "0-100", "0-25", "0-50", "0-100", "0-100"), Month = c("May", 
 "May", "May", "May", "May", "May", "May", "May", "May", "May", 
 "June", "June", "June", "June", "June", "June", "June", "June", 
 "June", "June", "June", "August", "August", "August", "August", 
 "August", "August", "August", "August", "August", "August", "August", 
 "January", "January", "January", "January", "January", "January", 
 "January", "January", "January", "January", "January"), Year = c(2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 
 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019)), row.names = c(NA, 
-43L), class = c("tbl_df", "tbl", "data.frame"))

What I want is the Type category, the DOS and the count of the total ships which fall into those unique identifiers. I then further want that grouped by month and year.

akrun · Accepted Answer

Not clear about the expected. Based on the description, group by all the columns (group_by_all), get the frequency count (n()) and spread to 'wide' format

library(dplyr)
df %>% 
   group_by_all %>% 
   summarise(n = n()) %>% 
   spread(DOS, n, fill = 0)

Or use count (group_by + summarise) and spread

df %>% 
  dplyr::count(Type, DOS, Month, Year) %>% 
  spread(DOS, n, fill = 0)

Grouping by multiple factors and summarizing counts of factors

Answers (1)

Related Questions