FiofanS
FiofanS

Reputation: 309

How to count aggregated data and create different counters?

I have a data frame df with different columns.

df = data.frame(c("2012","2012","2012","2013"),
                c("AAA","BBB","AAA","AAA"),
                c("X","Not-serviced","X","Y"))
colnames(df) = c("year","type","service_type")

I need to get the following dataframe df2:

year    type    num_serviced   num_notserviced   num_total
2012    AAA     2              0                 2
...

So, I need to group data by type and year, and then to count the frequency of Not-serviced and the all others, e.g. X, Y, etc. (supposed as Serviced).

This is my code that calculates Total:

temp = aggregate(df,
                 list(type = dat_human_errors$type,
                      year = dat_human_errors$year),
                 FUN = function(x){NROW(x)})

However how to create num_serviced and num_notserviced? There should be some IF-THEN rule like if type=="Not-serviced" num_notserviced++ else num_serviced++.

Upvotes: 2

Views: 98

Answers (3)

timat
timat

Reputation: 1500

the fastest way to collapse data is to use the package data.table

library(data.table)

df = data.frame(year = c("2012","2012","2012","2013"),
                type = c("AAA","BBB","AAA","AAA"),
                service_type= c("X","Not-serviced","X","Y"))



dt <- data.table(df)
dt<-  dt[,list(num_serviced= sum(service_type!="Not-serviced"), num_notserviced= sum(service_type=="Not-serviced")), by=c("year", "type")]
dt$num_total <- dt$num_serviced + dt$num_notserviced

#if you need to go back to dataframe:
df <- data.frame(dt)

df

  year type num_serviced num_notserviced num_total
1 2012  AAA            2               0         2
2 2012  BBB            0               1         1
3 2013  AAA            1               0         1

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388797

With dplyr you can do,

library(dplyr)
df %>%
    group_by(year,type) %>%
    summarise(num_serviced = sum(service_type != "Not-serviced"), 
              num_notserviced = sum(service_type == "Not-serviced"),
              num_total = num_serviced + num_notserviced)

#    year   type num_serviced num_notserviced num_total
#  <fctr> <fctr>        <int>           <int>     <int>
#1   2012    AAA            2               0         2
#2   2012    BBB            0               1         1
#3   2013    AAA            1               0         1

Upvotes: 2

akrun
akrun

Reputation: 886938

We can try with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'year', 'type', get the sum of logical vectors, and finally get the total.

library(data.table)
setDT(df)[, .(num_serviced = sum(service_type != "Not-serviced"), 
      num_notserviced = sum(service_type =="Not_serviced")), 
     .(year, type)][, Total := num_serviced + num_notserviced][]

Upvotes: 3

Related Questions