Reputation: 69
This is airline dataset from 2014 to 2018 with several Carriers flying on a certain date.
From this, I want a count of the CANCELLATION - which is a column with only binary data, where 0- not canceled and 1- canceled, grouped by OP_CARRIER, monthly.
I am new to R. I am able to just do these operations separately like the count using table(), and group by for OP_CARRIER.
Any help will be much appreciated. Thank you.
Upvotes: 0
Views: 65
Reputation: 336
you need to make a month column (I am assuming your date column is currently just a string).
df %>% mutate(FL_DATE = as.POSIXct(FL_DATE) %>%
mutate(month= format(FL_DATE,"%B") %>%
group_by(month, OP_CARRIER) %>%
summarise(cancelations = sum(CANCELLATION))
this will do everything per month over multiple years so if you want per year add
mutate(year= format(FL_DATE,"%Y"))
in there and edit the
group_by(month, year, OP_CARRIER)
Upvotes: 2
Reputation: 1046
Using dplyr
library(dplyr)
df %>%
group_by(carrier, cancellation, month = month(as.Date(FL_DATE)) %>%
summarise(count = n())
Upvotes: 1
Reputation: 887851
One option is rowsum
in base R
as CANCELLATION
is a binary variable
rowsum(df1$CANCELLATION, group = df1$OP_CARRIER)
In dplyr
. If we also need month
library(dplyr)
library(lubridate)
df1 %>%
group_by(OP_CARRIER, month = month(as.Date(FL_DATE))) %>%
summarise(CANCELLATION = sum(CANCELLATION))
Upvotes: 1