Reputation: 253
I have the following data frame about registered vehicles in Switzerland:
Fuel Year Region Count
Gasoline 2013 GE 169600
Diesel 2013 GE 46790
Hybrid 2013 GE 2268
Electric 2013 GE 85
Other 2013 GE 532
Gasoline 2013 VS 149232
Diesel 2013 VS 50591
Hybrid 2013 VS 1028
Electric 2013 VS 268
Other 2013 VS 261
I would like to add an additional "Pct" column in the data frame with the percentage of the fuel type by year and region but I have some difficulties to understand how. The result should be:
Fuel Year Region Count Pct
Gasoline 2013 GE 169600 0.7734
Diesel 2013 GE 46790 0.2134
Hybrid 2013 GE 2268 0.0134
Electric 2013 GE 85 0.0004
Other 2013 GE 532 0.0024
Gasoline 2013 VS 149232 0.7410
Diesel 2013 VS 50591 0.2512
Hybrid 2013 VS 1028 0.0051
Electric 2013 VS 268 0.0013
Other 2013 VS 261 0.0013
Upvotes: 2
Views: 2201
Reputation: 23024
Using the dplyr package, which has soared in popularity since this question was originally posed:
library(dplyr)
d %>%
group_by(Year, Region) %>%
mutate(Pct = Count / sum(Count))
# A tibble: 10 x 5
# Groups: Year, Region [2]
Fuel Year Region Count Pct
<fct> <int> <fct> <int> <dbl>
1 Gasoline 2013 GE 169600 0.773
2 Diesel 2013 GE 46790 0.213
3 Hybrid 2013 GE 2268 0.0103
4 Electric 2013 GE 85 0.000388
5 Other 2013 GE 532 0.00243
6 Gasoline 2013 VS 149232 0.741
7 Diesel 2013 VS 50591 0.251
8 Hybrid 2013 VS 1028 0.00510
9 Electric 2013 VS 268 0.00133
10 Other 2013 VS 261 0.00130
Upvotes: 0
Reputation: 887731
Using data.table
. if d
is the dataset
library(data.table)
setDT(d)[,Pct:=round(Count/sum(Count),4), by=list(Year, Region)]
d
# Fuel Year Region Count Pct
# 1: Gasoline 2013 GE 169600 0.7735
# 2: Diesel 2013 GE 46790 0.2134
# 3: Hybrid 2013 GE 2268 0.0103
# 4: Electric 2013 GE 85 0.0004
# 5: Other 2013 GE 532 0.0024
# 6: Gasoline 2013 VS 149232 0.7410
# 7: Diesel 2013 VS 50591 0.2512
# 8: Hybrid 2013 VS 1028 0.0051
# 9: Electric 2013 VS 268 0.0013
# 10: Other 2013 VS 261 0.0013
Upvotes: 2
Reputation: 44555
This is a good use case for ave
and then simple vector division:
# load your data
d <- read.table(text="Fuel Year Region Count
Gasoline 2013 GE 169600
Diesel 2013 GE 46790
Hybrid 2013 GE 2268
Electric 2013 GE 85
Other 2013 GE 532
Gasoline 2013 VS 149232
Diesel 2013 VS 50591
Hybrid 2013 VS 1028
Electric 2013 VS 268
Other 2013 VS 261", header = TRUE)
# `ave` by groups and divide
d$Pct <- d$Count/with(d, ave(Count, list(Year, Region), FUN = sum))
# or, equivalently:
# d <- within(d, Pct <- Count/ave(Count, list(Year, Region), FUN = sum))
Result:
> d
Fuel Year Region Count Pct
1 Gasoline 2013 GE 169600 0.7734579865
2 Diesel 2013 GE 46790 0.2133850188
3 Hybrid 2013 GE 2268 0.0103431764
4 Electric 2013 GE 85 0.0003876411
5 Other 2013 GE 532 0.0024261772
6 Gasoline 2013 VS 149232 0.7410467772
7 Diesel 2013 VS 50591 0.2512215712
8 Hybrid 2013 VS 1028 0.0051047770
9 Electric 2013 VS 268 0.0013308174
10 Other 2013 VS 261 0.0012960572
Upvotes: 4