Reputation: 729
I am using R and want to add certain values within a column but only if the rows satisfy a condition. So if I have a data frame data
below:
Team MP Win
ATL 14 .4
ATL 25 .4
ATL 14 .4
BOS 14 .55
BOS 20 .55
BOS 9 .55
How do I store the values of MP
for ATL
(14+25+14 = 53)and BOS
(14+20+9=43)?
EDIT: What if I also want to add a new variable that multiplies Win
by MP
/sums
(where sums
is the sum of MP
for each team
). So for the ATL
variables, I want the values .4*14/53 and .4*25/53, and for BOS
I want .55*14/43, .55*20/43, .55*9/43
Upvotes: 3
Views: 1619
Reputation: 10401
I think that would produce what you're looking for:
Edit
In light of akrun's excellent answer, here's a more compact solution:
dat$cumsums <- ave(dat$MP, dat$Team, FUN=sum)
dat$newvar <- with(dat, Win * (MP/cumsums))
Previous solution
cumsums <- by(data = dat$MP, INDICES = dat$Team, FUN = sum)
cumsums.df <- data.frame(Team = names(cumsums), cumsums = as.numeric(cumsums))
dat <- merge(x=dat, y=cumsums.df, by = "Team")
dat$newvar <- with(dat, Win * (MP/cumsums))
Results
dat
Team MP Win cumsums newvar
1 ATL 14 0.40 53 0.1056604
2 ATL 25 0.40 53 0.1886792
3 ATL 14 0.40 53 0.1056604
4 BOS 14 0.55 43 0.1790698
5 BOS 20 0.55 43 0.2558140
6 BOS 9 0.55 43 0.1151163
Data
dat <- read.csv(text="Team,MP,Win
ATL,14,.4
ATL,25,.4
ATL,14,.4
BOS,14,.55
BOS,20,.55
BOS,9,.55")
Upvotes: 4
Reputation: 887118
We could do this either using base R
, dplyr
or data.table
.
1. base R
Use within
and ave
to create the columns
within(dat, cumsums <- ave(MP, Team, FUN=sum)
newvar <- Win*(MP/cumsums))[c(1:3, 5:4)]
# Team MP Win cumsums newvar
#1 ATL 14 0.40 53 0.1056604
#2 ATL 25 0.40 53 0.1886792
#3 ATL 14 0.40 53 0.1056604
#4 BOS 14 0.55 43 0.1790698
#5 BOS 20 0.55 43 0.2558140
#6 BOS 9 0.55 43 0.1151163
2. data.table
If we need both the variables 'cumsums', 'newvar', convert the 'data.frame' to 'data.table' (setDT(dat)
), get the sum
of 'MP' column and use that to create the second column grouped by 'Team'
library(data.table)
setDT(dat)[, c('cumsums', 'newvar') := {tmp=sum(MP)
list(tmp, tmp1 = Win*MP/tmp)}, by = Team][]
# Team MP Win cumsums newvar
#1: ATL 14 0.40 53 0.1056604
#2: ATL 25 0.40 53 0.1886792
#3: ATL 14 0.40 53 0.1056604
#4: BOS 14 0.55 43 0.1790698
#5: BOS 20 0.55 43 0.2558140
#6: BOS 9 0.55 43 0.1151163
3. dplyr
After grouping by 'Team', use mutate
to create the columns 'cumsums' and 'newvar'
library(dplyr)
dat %>%
group_by(Team) %>%
mutate(cumsums= sum(MP), newvar= Win*MP/cumsums)
# Team MP Win cumsums newvar
#1 ATL 14 0.40 53 0.1056604
#2 ATL 25 0.40 53 0.1886792
#3 ATL 14 0.40 53 0.1056604
#4 BOS 14 0.55 43 0.1790698
#5 BOS 20 0.55 43 0.2558140
#6 BOS 9 0.55 43 0.1151163
dat <- structure(list(Team = c("ATL", "ATL", "ATL", "BOS", "BOS", "BOS"
), MP = c(14L, 25L, 14L, 14L, 20L, 9L), Win = c(0.4, 0.4, 0.4,
0.55, 0.55, 0.55)), .Names = c("Team", "MP", "Win"),
class = "data.frame", row.names = c(NA, -6L))
Upvotes: 2
Reputation: 564
aggregate
will do exactly what you are looking for
> data <- merge(data, aggregate(MP~Team, data = data, sum), by = 'Team', all.x = T)
> names(data) <- c('Team', 'MP', 'Win', 'SumByTeam')
> data$Value <- data$MP /data$SumByTeam * data$Win
> aggregate(Value ~ Team + MP.x, data = data, mean)
Team MP Value
1 BOS 9 0.1151163
2 ATL 14 0.1056604
3 BOS 14 0.1790698
4 BOS 20 0.2558140
5 ATL 25 0.1886792
Upvotes: 0