Davide Rivola
Davide Rivola

Reputation: 253

Calculate percentage by column values

I have the following data frame about registered vehicles in Switzerland:

Fuel     Year   Region   Count
Gasoline 2013       GE  169600
Diesel   2013       GE   46790
Hybrid   2013       GE    2268
Electric 2013       GE      85
Other    2013       GE     532
Gasoline 2013       VS  149232
Diesel   2013       VS   50591
Hybrid   2013       VS    1028
Electric 2013       VS     268
Other    2013       VS     261

I would like to add an additional "Pct" column in the data frame with the percentage of the fuel type by year and region but I have some difficulties to understand how. The result should be:

Fuel     Year   Region   Count  Pct
Gasoline 2013       GE  169600  0.7734
Diesel   2013       GE   46790  0.2134
Hybrid   2013       GE    2268  0.0134
Electric 2013       GE      85  0.0004
Other    2013       GE     532  0.0024
Gasoline 2013       VS  149232  0.7410
Diesel   2013       VS   50591  0.2512
Hybrid   2013       VS    1028  0.0051
Electric 2013       VS     268  0.0013
Other    2013       VS     261  0.0013

Upvotes: 2

Views: 2201

Answers (3)

Sam Firke
Sam Firke

Reputation: 23024

Using the dplyr package, which has soared in popularity since this question was originally posed:

library(dplyr)
d %>%
  group_by(Year, Region) %>%
  mutate(Pct = Count / sum(Count))

# A tibble: 10 x 5
# Groups:   Year, Region [2]
   Fuel      Year Region  Count      Pct
   <fct>    <int> <fct>   <int>    <dbl>
 1 Gasoline  2013 GE     169600 0.773   
 2 Diesel    2013 GE      46790 0.213   
 3 Hybrid    2013 GE       2268 0.0103  
 4 Electric  2013 GE         85 0.000388
 5 Other     2013 GE        532 0.00243 
 6 Gasoline  2013 VS     149232 0.741   
 7 Diesel    2013 VS      50591 0.251   
 8 Hybrid    2013 VS       1028 0.00510 
 9 Electric  2013 VS        268 0.00133 
10 Other     2013 VS        261 0.00130 

Upvotes: 0

akrun
akrun

Reputation: 887731

Using data.table. if d is the dataset

 library(data.table)
 setDT(d)[,Pct:=round(Count/sum(Count),4), by=list(Year, Region)]
 d
#         Fuel Year Region  Count    Pct
#  1: Gasoline 2013     GE 169600 0.7735
#  2:   Diesel 2013     GE  46790 0.2134
#  3:   Hybrid 2013     GE   2268 0.0103
#  4: Electric 2013     GE     85 0.0004
#  5:    Other 2013     GE    532 0.0024
#  6: Gasoline 2013     VS 149232 0.7410
#  7:   Diesel 2013     VS  50591 0.2512
#  8:   Hybrid 2013     VS   1028 0.0051
#  9: Electric 2013     VS    268 0.0013
# 10:    Other 2013     VS    261 0.0013

Upvotes: 2

Thomas
Thomas

Reputation: 44555

This is a good use case for ave and then simple vector division:

# load your data
d <- read.table(text="Fuel     Year   Region   Count
Gasoline 2013       GE  169600
Diesel   2013       GE   46790
Hybrid   2013       GE    2268
Electric 2013       GE      85
Other    2013       GE     532
Gasoline 2013       VS  149232
Diesel   2013       VS   50591
Hybrid   2013       VS    1028
Electric 2013       VS     268
Other    2013       VS     261", header = TRUE)

# `ave` by groups and divide
d$Pct <- d$Count/with(d, ave(Count, list(Year, Region), FUN = sum))
# or, equivalently:
# d <- within(d, Pct <- Count/ave(Count, list(Year, Region), FUN = sum))

Result:

> d
       Fuel Year Region  Count          Pct
1  Gasoline 2013     GE 169600 0.7734579865
2    Diesel 2013     GE  46790 0.2133850188
3    Hybrid 2013     GE   2268 0.0103431764
4  Electric 2013     GE     85 0.0003876411
5     Other 2013     GE    532 0.0024261772
6  Gasoline 2013     VS 149232 0.7410467772
7    Diesel 2013     VS  50591 0.2512215712
8    Hybrid 2013     VS   1028 0.0051047770
9  Electric 2013     VS    268 0.0013308174
10    Other 2013     VS    261 0.0012960572

Upvotes: 4

Related Questions