user4352158
user4352158

Reputation: 729

Add certain rows in a column that satisfy condition in R?

I am using R and want to add certain values within a column but only if the rows satisfy a condition. So if I have a data frame data below:

 Team MP Win
 ATL  14 .4
 ATL  25 .4
 ATL  14 .4
 BOS  14 .55
 BOS  20  .55
 BOS  9  .55

How do I store the values of MP for ATL (14+25+14 = 53)and BOS (14+20+9=43)?

EDIT: What if I also want to add a new variable that multiplies Win by MP/sums (where sums is the sum of MP for each team). So for the ATL variables, I want the values .4*14/53 and .4*25/53, and for BOS I want .55*14/43, .55*20/43, .55*9/43

Upvotes: 3

Views: 1619

Answers (3)

Dominic Comtois
Dominic Comtois

Reputation: 10401

I think that would produce what you're looking for:

Edit

In light of akrun's excellent answer, here's a more compact solution:

dat$cumsums <- ave(dat$MP, dat$Team, FUN=sum)
dat$newvar <- with(dat, Win * (MP/cumsums))

Previous solution

cumsums <- by(data = dat$MP, INDICES = dat$Team, FUN = sum)
cumsums.df <- data.frame(Team = names(cumsums), cumsums = as.numeric(cumsums))
dat <- merge(x=dat, y=cumsums.df, by = "Team")
dat$newvar <- with(dat, Win * (MP/cumsums))

Results

dat
  Team MP  Win cumsums    newvar
1  ATL 14 0.40      53 0.1056604
2  ATL 25 0.40      53 0.1886792
3  ATL 14 0.40      53 0.1056604
4  BOS 14 0.55      43 0.1790698
5  BOS 20 0.55      43 0.2558140
6  BOS  9 0.55      43 0.1151163

Data

dat <- read.csv(text="Team,MP,Win
ATL,14,.4
ATL,25,.4
ATL,14,.4
BOS,14,.55
BOS,20,.55
BOS,9,.55")

Upvotes: 4

akrun
akrun

Reputation: 887118

We could do this either using base R, dplyr or data.table.

1. base R

Use within and ave to create the columns

  within(dat, cumsums <- ave(MP, Team, FUN=sum)
                newvar <- Win*(MP/cumsums))[c(1:3, 5:4)]
  #  Team MP  Win cumsums    newvar
  #1  ATL 14 0.40      53 0.1056604
  #2  ATL 25 0.40      53 0.1886792
  #3  ATL 14 0.40      53 0.1056604
  #4  BOS 14 0.55      43 0.1790698
  #5  BOS 20 0.55      43 0.2558140
  #6  BOS  9 0.55      43 0.1151163

2. data.table

If we need both the variables 'cumsums', 'newvar', convert the 'data.frame' to 'data.table' (setDT(dat)), get the sum of 'MP' column and use that to create the second column grouped by 'Team'

library(data.table)
setDT(dat)[, c('cumsums', 'newvar') := {tmp=sum(MP) 
                   list(tmp, tmp1 = Win*MP/tmp)}, by = Team][]
#    Team MP  Win cumsums    newvar
#1:  ATL 14 0.40      53 0.1056604
#2:  ATL 25 0.40      53 0.1886792
#3:  ATL 14 0.40      53 0.1056604
#4:  BOS 14 0.55      43 0.1790698
#5:  BOS 20 0.55      43 0.2558140
#6:  BOS  9 0.55      43 0.1151163

3. dplyr

After grouping by 'Team', use mutate to create the columns 'cumsums' and 'newvar'

library(dplyr)
 dat %>% 
     group_by(Team) %>% 
     mutate(cumsums= sum(MP), newvar= Win*MP/cumsums)
 #  Team MP  Win cumsums    newvar
 #1  ATL 14 0.40      53 0.1056604
 #2  ATL 25 0.40      53 0.1886792
 #3  ATL 14 0.40      53 0.1056604
 #4  BOS 14 0.55      43 0.1790698
 #5  BOS 20 0.55      43 0.2558140
 #6  BOS  9 0.55      43 0.1151163

data

dat <- structure(list(Team = c("ATL", "ATL", "ATL", "BOS", "BOS", "BOS"
 ), MP = c(14L, 25L, 14L, 14L, 20L, 9L), Win = c(0.4, 0.4, 0.4, 
 0.55, 0.55, 0.55)), .Names = c("Team", "MP", "Win"),
 class = "data.frame", row.names = c(NA, -6L))

Upvotes: 2

SeaSprite
SeaSprite

Reputation: 564

aggregate will do exactly what you are looking for

> data <- merge(data, aggregate(MP~Team, data = data, sum), by = 'Team', all.x = T)
> names(data) <- c('Team', 'MP', 'Win', 'SumByTeam')
> data$Value <- data$MP /data$SumByTeam * data$Win
> aggregate(Value ~ Team + MP.x, data = data, mean)
  Team     MP         Value
1  BOS      9     0.1151163
2  ATL     14     0.1056604
3  BOS     14     0.1790698
4  BOS     20     0.2558140
5  ATL     25     0.1886792

Upvotes: 0

Related Questions