Reputation: 413
Example data below...
I want to tally the number of "type" points counted per month (type is shipping vessels). So initially I want to summarize how many of "type" vessels are counted in each month total. e.g. June has 5 counts of fishing vessels points.
preferably using dplyr:
I have something like:
dfsum <- df %>% group_by(Month, Type) %>% tally()
Which works well enough however, I further would like to do the above but also by unique vessel ID's - a ship can have multiple points per month, but I would like to know how many unique vessels are present each month.
I could just add group by id:
dfsum2 <- df %>% group_by(Month, id,Type) %>% tally()
However, this is less tidy and with a larger data set would be harder to compile - rather I want the result that in Feb there are 2 unique fishing vessels (using this data example) - is there a better way to extract this information?
Desired output:
Month Type n
Jan Fishing x
Feb Fishing x
Feb Sailing x
March Fishing x
Where x is the number or count of unique vessels by ID in that category that month.
#Dummy data
df<- structure(list(UTC_Time = structure(c(1L, 1L, 1L, 1L, 339L, 339L,
339L, 68L, 68L, 68L, 154L, 154L, 154L, 154L, 154L, 154L, 14L,
14L, 14L, 14L, 14L, 15L, 50L, 50L, 51L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 77L, 146L, 147L, 147L, 147L, 147L, 147L, 148L,
148L), .Label = c("2018-01-01 0:00:00", "2018-01-02 0:00:00",
"2018-01-03 0:00:00", "2018-01-04 0:00:00", "2018-01-05 0:00:00",
"2018-01-06 0:00:00", "2018-01-07 0:00:00", "2018-01-08 0:00:00",
"2018-01-09 0:00:00", "2018-01-10 0:00:00", "2018-01-11 0:00:00",
"2018-01-12 0:00:00", "2018-01-13 0:00:00", "2018-01-14 0:00:00",
"2018-01-15 0:00:00", "2018-01-16 0:00:00", "2018-01-17 0:00:00",
"2018-01-18 0:00:00", "2018-01-19 0:00:00", "2018-01-20 0:00:00",
"2018-01-21 0:00:00", "2018-01-22 0:00:00", "2018-01-23 0:00:00",
"2018-01-24 0:00:00", "2018-01-25 0:00:00", "2018-01-26 0:00:00",
"2018-01-27 0:00:00", "2018-01-28 0:00:00", "2018-01-29 0:00:00",
"2018-01-30 0:00:00", "2018-01-31 0:00:00", "2018-02-01 0:00:00",
"2018-02-02 0:00:00", "2018-02-03 0:00:00", "2018-02-04 0:00:00",
"2018-02-05 0:00:00", "2018-02-06 0:00:00", "2018-02-07 0:00:00",
"2018-02-08 0:00:00", "2018-02-09 0:00:00", "2018-02-10 0:00:00",
"2018-02-11 0:00:00", "2018-02-12 0:00:00", "2018-02-13 0:00:00",
"2018-02-14 0:00:00", "2018-02-15 0:00:00", "2018-02-16 0:00:00",
"2018-02-17 0:00:00", "2018-02-18 0:00:00", "2018-02-19 0:00:00",
"2018-02-20 0:00:00", "2018-02-21 0:00:00", "2018-02-22 0:00:00",
"2018-02-23 0:00:00", "2018-02-24 0:00:00", "2018-02-25 0:00:00",
"2018-02-26 0:00:00", "2018-02-27 0:00:00", "2018-02-28 0:00:00",
"2018-03-01 0:00:00", "2018-03-02 0:00:00", "2018-03-03 0:00:00",
"2018-03-04 0:00:00", "2018-03-05 0:00:00", "2018-03-06 0:00:00",
"2018-03-07 0:00:00", "2018-03-08 0:00:00", "2018-03-09 0:00:00",
"2018-03-10 0:00:00", "2018-03-11 0:00:00", "2018-03-12 0:00:00",
"2018-03-13 0:00:00", "2018-03-14 0:00:00", "2018-03-15 0:00:00",
"2018-03-16 0:00:00", "2018-03-17 0:00:00", "2018-03-18 0:00:00",
"2018-03-19 0:00:00", "2018-03-20 0:00:00", "2018-03-21 0:00:00",
"2018-03-22 0:00:00", "2018-03-23 0:00:00", "2018-03-24 0:00:00",
"2018-03-25 0:00:00", "2018-03-26 0:00:00", "2018-03-27 0:00:00",
"2018-03-28 0:00:00", "2018-03-29 0:00:00", "2018-03-30 0:00:00",
"2018-03-31 0:00:00", "2018-04-01 0:00:00", "2018-04-02 0:00:00",
"2018-04-03 0:00:00", "2018-04-04 0:00:00", "2018-04-05 0:00:00",
"2018-04-06 0:00:00", "2018-04-07 0:00:00", "2018-04-08 0:00:00",
"2018-04-09 0:00:00", "2018-04-10 0:00:00", "2018-04-11 0:00:00",
"2018-04-12 0:00:00", "2018-04-13 0:00:00", "2018-04-14 0:00:00",
"2018-04-15 0:00:00", "2018-04-16 0:00:00", "2018-04-17 0:00:00",
"2018-04-18 0:00:00", "2018-04-19 0:00:00", "2018-04-20 0:00:00",
"2018-04-21 0:00:00", "2018-04-22 0:00:00", "2018-04-23 0:00:00",
"2018-04-24 0:00:00", "2018-04-25 0:00:00", "2018-04-26 0:00:00",
"2018-04-27 0:00:00", "2018-04-28 0:00:00", "2018-04-29 0:00:00",
"2018-04-30 0:00:00", "2018-05-01 0:00:00", "2018-05-02 0:00:00",
"2018-05-03 0:00:00", "2018-05-04 0:00:00", "2018-05-05 0:00:00",
"2018-05-06 0:00:00", "2018-05-07 0:00:00", "2018-05-08 0:00:00",
"2018-05-09 0:00:00", "2018-05-10 0:00:00", "2018-05-11 0:00:00",
"2018-05-12 0:00:00", "2018-05-13 0:00:00", "2018-05-14 0:00:00",
"2018-05-15 0:00:00", "2018-05-16 0:00:00", "2018-05-17 0:00:00",
"2018-05-18 0:00:00", "2018-05-19 0:00:00", "2018-05-20 0:00:00",
"2018-05-21 0:00:00", "2018-05-22 0:00:00", "2018-05-23 0:00:00",
"2018-05-24 0:00:00", "2018-05-25 0:00:00", "2018-05-26 0:00:00",
"2018-05-27 0:00:00", "2018-05-28 0:00:00", "2018-05-29 0:00:00",
"2018-05-30 0:00:00", "2018-05-31 0:00:00", "2018-06-01 0:00:00",
"2018-06-02 0:00:00", "2018-06-03 0:00:00", "2018-06-04 0:00:00",
"2018-06-05 0:00:00", "2018-06-06 0:00:00", "2018-06-07 0:00:00",
"2018-06-08 0:00:00", "2018-06-09 0:00:00", "2018-06-10 0:00:00",
"2018-06-11 0:00:00", "2018-06-12 0:00:00", "2018-06-13 0:00:00",
"2018-06-14 0:00:00", "2018-06-15 0:00:00", "2018-06-16 0:00:00",
"2018-06-17 0:00:00", "2018-06-18 0:00:00", "2018-06-19 0:00:00",
"2018-06-20 0:00:00", "2018-06-21 0:00:00", "2018-06-22 0:00:00",
"2018-06-23 0:00:00", "2018-06-24 0:00:00", "2018-06-25 0:00:00",
"2018-06-26 0:00:00", "2018-06-27 0:00:00", "2018-06-28 0:00:00",
"2018-06-29 0:00:00", "2018-06-30 0:00:00", "2018-07-01 0:00:00",
"2018-07-02 0:00:00", "2018-07-03 0:00:00", "2018-07-04 0:00:00",
"2018-07-05 0:00:00", "2018-07-06 0:00:00", "2018-07-07 0:00:00",
"2018-07-08 0:00:00", "2018-07-09 0:00:00", "2018-07-10 0:00:00",
"2018-07-11 0:00:00", "2018-07-12 0:00:00", "2018-07-13 0:00:00",
"2018-07-14 0:00:00", "2018-07-15 0:00:00", "2018-07-16 0:00:00",
"2018-07-17 0:00:00", "2018-07-18 0:00:00", "2018-07-19 0:00:00",
"2018-07-20 0:00:00", "2018-07-21 0:00:00", "2018-07-22 0:00:00",
"2018-07-23 0:00:00", "2018-07-24 0:00:00", "2018-07-25 0:00:00",
"2018-07-26 0:00:00", "2018-07-27 0:00:00", "2018-07-28 0:00:00",
"2018-07-29 0:00:00", "2018-07-30 0:00:00", "2018-07-31 0:00:00",
"2018-08-01 0:00:00", "2018-08-02 0:00:00", "2018-08-03 0:00:00",
"2018-08-04 0:00:00", "2018-08-05 0:00:00", "2018-08-06 0:00:00",
"2018-08-07 0:00:00", "2018-08-08 0:00:00", "2018-08-09 0:00:00",
"2018-08-10 0:00:00", "2018-08-11 0:00:00", "2018-08-12 0:00:00",
"2018-08-13 0:00:00", "2018-08-14 0:00:00", "2018-08-15 0:00:00",
"2018-08-16 0:00:00", "2018-08-17 0:00:00", "2018-08-18 0:00:00",
"2018-08-19 0:00:00", "2018-08-20 0:00:00", "2018-08-21 0:00:00",
"2018-08-22 0:00:00", "2018-08-23 0:00:00", "2018-08-24 0:00:00",
"2018-08-25 0:00:00", "2018-08-26 0:00:00", "2018-08-27 0:00:00",
"2018-08-28 0:00:00", "2018-08-29 0:00:00", "2018-08-30 0:00:00",
"2018-08-31 0:00:00", "2018-09-01 0:00:00", "2018-09-02 0:00:00",
"2018-09-03 0:00:00", "2018-09-04 0:00:00", "2018-09-05 0:00:00",
"2018-09-06 0:00:00", "2018-09-07 0:00:00", "2018-09-08 0:00:00",
"2018-09-09 0:00:00", "2018-09-10 0:00:00", "2018-09-11 0:00:00",
"2018-09-12 0:00:00", "2018-09-13 0:00:00", "2018-09-14 0:00:00",
"2018-09-15 0:00:00", "2018-09-16 0:00:00", "2018-09-17 0:00:00",
"2018-09-18 0:00:00", "2018-09-19 0:00:00", "2018-09-20 0:00:00",
"2018-09-21 0:00:00", "2018-09-22 0:00:00", "2018-09-23 0:00:00",
"2018-09-24 0:00:00", "2018-09-25 0:00:00", "2018-09-26 0:00:00",
"2018-09-27 0:00:00", "2018-09-28 0:00:00", "2018-09-29 0:00:00",
"2018-09-30 0:00:00", "2018-10-01 0:00:00", "2018-10-02 0:00:00",
"2018-10-03 0:00:00", "2018-10-04 0:00:00", "2018-10-05 0:00:00",
"2018-10-06 0:00:00", "2018-10-07 0:00:00", "2018-10-08 0:00:00",
"2018-10-09 0:00:00", "2018-10-10 0:00:00", "2018-10-11 0:00:00",
"2018-10-12 0:00:00", "2018-10-13 0:00:00", "2018-10-14 0:00:00",
"2018-10-15 0:00:00", "2018-10-16 0:00:00", "2018-10-17 0:00:00",
"2018-10-18 0:00:00", "2018-10-19 0:00:00", "2018-10-20 0:00:00",
"2018-10-21 0:00:00", "2018-10-22 0:00:00", "2018-10-23 0:00:00",
"2018-10-24 0:00:00", "2018-10-25 0:00:00", "2018-10-26 0:00:00",
"2018-10-27 0:00:00", "2018-10-28 0:00:00", "2018-10-29 0:00:00",
"2018-10-30 0:00:00", "2018-10-31 0:00:00", "2018-11-01 0:00:00",
"2018-11-02 0:00:00", "2018-11-03 0:00:00", "2018-11-04 0:00:00",
"2018-11-05 0:00:00", "2018-11-06 0:00:00", "2018-11-07 0:00:00",
"2018-11-08 0:00:00", "2018-11-09 0:00:00", "2018-11-10 0:00:00",
"2018-11-11 0:00:00", "2018-11-12 0:00:00", "2018-11-13 0:00:00",
"2018-11-14 0:00:00", "2018-11-15 0:00:00", "2018-11-16 0:00:00",
"2018-11-17 0:00:00", "2018-11-18 0:00:00", "2018-11-19 0:00:00",
"2018-11-20 0:00:00", "2018-11-21 0:00:00", "2018-11-22 0:00:00",
"2018-11-23 0:00:00", "2018-11-24 0:00:00", "2018-11-25 0:00:00",
"2018-11-26 0:00:00", "2018-11-27 0:00:00", "2018-11-28 0:00:00",
"2018-11-29 0:00:00", "2018-11-30 0:00:00", "2018-12-01 0:00:00",
"2018-12-02 0:00:00", "2018-12-03 0:00:00", "2018-12-04 0:00:00",
"2018-12-05 0:00:00", "2018-12-06 0:00:00", "2018-12-07 0:00:00",
"2018-12-08 0:00:00", "2018-12-09 0:00:00", "2018-12-10 0:00:00",
"2018-12-11 0:00:00", "2018-12-12 0:00:00", "2018-12-13 0:00:00",
"2018-12-14 0:00:00", "2018-12-15 0:00:00", "2018-12-16 0:00:00",
"2018-12-17 0:00:00", "2018-12-18 0:00:00", "2018-12-19 0:00:00",
"2018-12-20 0:00:00", "2018-12-21 0:00:00", "2018-12-22 0:00:00",
"2018-12-23 0:00:00", "2018-12-24 0:00:00", "2018-12-25 0:00:00",
"2018-12-26 0:00:00", "2018-12-27 0:00:00", "2018-12-28 0:00:00",
"2018-12-29 0:00:00", "2018-12-30 0:00:00", "2018-12-31 0:00:00",
"2019-01-01 0:00:00"), class = "factor"), Type = structure(c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 17L, 17L, 17L, 4L, 12L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 17L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Cargo ship",
"Cargo ship:DG,HS,MP(OS)", "Cargo ship:DG,HS,MP(X)", "Fishing",
"Law enforcement", "Local ship", "Passenger ship", "Passenger ship:DG,HS,MP(OS)",
"Passenger ship:DG,HS,MP(Y)", "Pilot", "Pleasure Craft", "Sailing",
"Search/rescue", "Ship", "Towing", "Towing(200/25)", "Tug"), class = "factor"),
Month = structure(c(5L, 5L, 5L, 5L, 3L, 3L, 3L, 8L, 8L, 8L,
7L, 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L), .Label = c("Apr", "Aug", "Dec", "Feb", "Jan", "Jul",
"Jun", "Mar", "May", "Nov", "Oct", "Sep"), class = "factor"),
id = c(27L, 27L, 27L, 27L, 21L, 21L, 21L, 24L, 24L, 24L,
20L, 6L, 20L, 20L, 20L, 20L, 48L, 48L, 48L, 48L, 48L, 42L,
34L, 34L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 23L,
17L, 17L, 17L, 14L, 14L, 3L, 14L, 3L)), row.names = c(1L,
2L, 3L, 4L, 650L, 651L, 652L, 262L, 263L, 264L, 400L, 401L, 402L,
403L, 404L, 405L, 100L, 101L, 102L, 103L, 104L, 105L, 250L, 251L,
252L, 253L, 254L, 255L, 256L, 257L, 258L, 259L, 260L, 300L, 301L,
302L, 303L, 304L, 305L, 306L, 307L, 308L), class = "data.frame")
Upvotes: 0
Views: 727
Reputation: 39623
A base R
approach can be next (sometimes can be fast):
#Code
result <- aggregate(Type~Month,df,function(x) length(unique(x)))
Output:
Month Type
1 Dec 1
2 Feb 1
3 Jan 1
4 Jun 2
5 Mar 1
6 May 1
Or maybe:
#Code 2
result2 <- aggregate(id~Month,df,function(x) length(unique(x)))
Output:
Month id
1 Dec 1
2 Feb 2
3 Jan 3
4 Jun 2
5 Mar 2
6 May 3
Based on the expected output you can try this:
#Code
new <- aggregate(id~Month+Type,data=df,function(x) length(unique(x)))
Output:
Month Type id
1 Dec Fishing 1
2 Feb Fishing 2
3 Jan Fishing 3
4 Jun Fishing 1
5 May Passenger ship 3
6 Jun Sailing 1
7 Mar Tug 2
Or using dplyr
:
library(dplyr)
#Code
new <- df %>% group_by(Month,Type) %>% summarise(N=length(unique(id)))
Output:
# A tibble: 7 x 3
# Groups: Month [6]
Month Type N
<fct> <fct> <int>
1 Dec Fishing 1
2 Feb Fishing 2
3 Jan Fishing 3
4 Jun Fishing 1
5 Jun Sailing 1
6 Mar Tug 2
7 May Passenger ship 3
Upvotes: 2
Reputation: 887981
We can use n_distinct
to find the number of unique 'Type' by 'Month'
library(dplyr)
df %>%
group_by(Month) %>%
summarise(n = n_distinct(Type))
-output
# A tibble: 6 x 2
# Month n
# <fct> <int>
#1 Dec 1
#2 Feb 1
#3 Jan 1
#4 Jun 2
#5 Mar 1
#6 May 1
If it is based on 'id'
df %>%
group_by(Month) %>%
summarise(n = n_distinct(id))
-output
# A tibble: 6 x 2
# Month n
# <fct> <int>
#1 Dec 1
#2 Feb 2
#3 Jan 3
#4 Jun 2
#5 Mar 2
#6 May 3
Or another option is to get the distinct
rows and use count
df %>%
distinct(Month, Type) %>%
count(Month)
Or with data.table
library(data.table)
setDT(df)[, .(n = uniqueN(Type)), Month]
Or with base R
aggregate(Type ~ Month, unique(df[c('Type', 'Month')]), length)
aggregate(id ~ Month, unique(df[c('id', 'Month')]), length)
Regarding the efficiency of base R
, especially aggregate
, it would be slow as mentioned here
Upvotes: 1