Lmm
Lmm

Reputation: 413

Grouping by multiple factors and summarizing counts of factors

I have a bunch of categorical ship "Type" data, e.g. passenger, fishing, cargo etc. within different distances offshore (DOS, e.g 0-12 nm, 0-25 nm etc.) for different months of the year.

Initially I want to get a count of the number of Type, e.g. passenger, ships for each DOS for the whole year/data set. I then want to do the same for each month of the year.

I guess this would be some kind of group_by function followed by summaries?? but I am having a hart time trying to get the output as I am not that great using dplyr yet.

some things I have tried:

ships <- df %>% group_by(DOS, Type)
shipc <- summarize(ships, count = n())

df1 <- gather(df, Type, DOS) %>% count(Type, DOS) %>% spread(DOS, n, fill = 0)

But I am pretty sure it is not working, because I do not understand the syntax properly....

Here is some dummy data:

df <- structure(list(Type = c("Cargo ship", "Cargo ship", "Cargo ship", 
"Cargo ship", "Cargo ship", "Cargo ship", "Fishing", "Fishing", 
 "Fishing", "Fishing", "Fishing", "Cargo ship", "Cargo ship", 
 "Cargo ship", "Cargo ship", "Cargo ship", "Fishing", "Fishing", 
"Fishing", "Fishing", "Fishing", "Fishing", "Fishing", "Fishing", 
"Fishing", "Cargo ship:DG,HS,MP(A)", "Cargo ship", "Cargo ship", 
"Fishing", "Fishing", "Fishing", "Fishing", "Fishing", "Tanker", 
 "Cargo ship", "Cargo ship", "Fishing", "Fishing", "Cargo 
 ship:DG,HS,MP(A)", 
 "Cargo ship:DG,HS,MP(D)", "Cargo ship:DG,HS,MP(D)", "Cargo 
 ship:DG,HS,MP(D)", 
 "Cargo ship"), DOS = c("0-100", "0-50", "0-25", "0-100", "0-50", 
 "0-25", "0-100", "0-25", "0-12", "0-50", "0-100", "0-50", "0-100", 
 "0-25", "0-50", "0-100", "0-50", "0-25", "0-50", "0-100", "0-25", 
 "0-100", "0-100", "0-50", "0-25", "0-100", "0-100", "0-50", "0-100", 
 "0-50", "0-25", "0-100", "0-100", "0-100", "0-50", "0-100", "0-100", 
 "0-100", "0-100", "0-25", "0-50", "0-100", "0-100"), Month = c("May", 
 "May", "May", "May", "May", "May", "May", "May", "May", "May", 
 "June", "June", "June", "June", "June", "June", "June", "June", 
 "June", "June", "June", "August", "August", "August", "August", 
 "August", "August", "August", "August", "August", "August", "August", 
 "January", "January", "January", "January", "January", "January", 
 "January", "January", "January", "January", "January"), Year = c(2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 
 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019)), row.names = c(NA, 
-43L), class = c("tbl_df", "tbl", "data.frame"))

What I want is the Type category, the DOS and the count of the total ships which fall into those unique identifiers. I then further want that grouped by month and year.

Upvotes: 0

Views: 56

Answers (1)

akrun
akrun

Reputation: 886938

Not clear about the expected. Based on the description, group by all the columns (group_by_all), get the frequency count (n()) and spread to 'wide' format

library(dplyr)
df %>% 
   group_by_all %>% 
   summarise(n = n()) %>% 
   spread(DOS, n, fill = 0)

Or use count (group_by + summarise) and spread

df %>% 
  dplyr::count(Type, DOS, Month, Year) %>% 
  spread(DOS, n, fill = 0)

Upvotes: 1

Related Questions