Reputation: 309
I have a data frame df
with different columns.
df = data.frame(c("2012","2012","2012","2013"),
c("AAA","BBB","AAA","AAA"),
c("X","Not-serviced","X","Y"))
colnames(df) = c("year","type","service_type")
I need to get the following dataframe df2
:
year type num_serviced num_notserviced num_total
2012 AAA 2 0 2
...
So, I need to group data by type
and year
, and then to count the frequency of Not-serviced
and the all others, e.g. X
, Y
, etc. (supposed as Serviced).
This is my code that calculates Total:
temp = aggregate(df,
list(type = dat_human_errors$type,
year = dat_human_errors$year),
FUN = function(x){NROW(x)})
However how to create num_serviced
and num_notserviced
? There should be some IF-THEN rule like if type=="Not-serviced" num_notserviced++ else num_serviced++
.
Upvotes: 2
Views: 98
Reputation: 1500
the fastest way to collapse data is to use the package data.table
library(data.table)
df = data.frame(year = c("2012","2012","2012","2013"),
type = c("AAA","BBB","AAA","AAA"),
service_type= c("X","Not-serviced","X","Y"))
dt <- data.table(df)
dt<- dt[,list(num_serviced= sum(service_type!="Not-serviced"), num_notserviced= sum(service_type=="Not-serviced")), by=c("year", "type")]
dt$num_total <- dt$num_serviced + dt$num_notserviced
#if you need to go back to dataframe:
df <- data.frame(dt)
df
year type num_serviced num_notserviced num_total
1 2012 AAA 2 0 2
2 2012 BBB 0 1 1
3 2013 AAA 1 0 1
Upvotes: 1
Reputation: 388797
With dplyr
you can do,
library(dplyr)
df %>%
group_by(year,type) %>%
summarise(num_serviced = sum(service_type != "Not-serviced"),
num_notserviced = sum(service_type == "Not-serviced"),
num_total = num_serviced + num_notserviced)
# year type num_serviced num_notserviced num_total
# <fctr> <fctr> <int> <int> <int>
#1 2012 AAA 2 0 2
#2 2012 BBB 0 1 1
#3 2013 AAA 1 0 1
Upvotes: 2
Reputation: 886938
We can try with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'year', 'type', get the sum
of logical vectors, and finally get the total.
library(data.table)
setDT(df)[, .(num_serviced = sum(service_type != "Not-serviced"),
num_notserviced = sum(service_type =="Not_serviced")),
.(year, type)][, Total := num_serviced + num_notserviced][]
Upvotes: 3