NBC
NBC

Reputation: 1698

Complicated table calculations

I have a dataframe df_workingFile:

group  | id  | flag | value |
 a     |  a  |  0   |  5    |  
 a     |  c  |  0   |  2    |  
 b     |  b  |  0   |  8    |  
 b     |  e  |  1   |  1    |  

I want to sum the value column based on group, but only on the row where group = id (there can only be one of these instances by group). I also don't want to include any values in the sum that have a flag:

group  | id  | flag | value | ValueSum
 a     |  a  |  0   |  5    |   7
 a     |  c  |  0   |  2    |   0
 b     |  b  |  0   |  8    |   8
 b     |  e  |  1   |  1    |   0

Input data:

df_workingFile <- structure(list(group = structure(c(1L, 1L, 2L, 2L), .Label = c("a", 
"b"), class = "factor"), id = structure(c(1L, 3L, 2L, 4L), .Label = c("a", 
"b", "c", "e"), class = "factor"), flag = c(0, 0, 0, 1), value = c(5, 
2, 8, 1)), .Names = c("group", "id", "flag", "value"), row.names = c(NA, 
4L), class = "data.frame")

Upvotes: 1

Views: 46

Answers (3)

lmo
lmo

Reputation: 38510

In base R, you could use ave with sum and rep to fill in the 0s for the final values.

df_workingFile$valueSum <-
          with(df_workingFile,
               ave(value * !flag, group, FUN=function(x) c(sum(x), rep(0, length(x)-1))))

This returns the desired results.

df_workingFile
  group id flag value valueSum
1     a  a    0     5        7
2     a  c    0     2        0
3     b  b    0     8        8
4     b  e    1     1        0

Upvotes: 1

Lamia
Lamia

Reputation: 3875

Using dplyr, you could do:

library(dplyr)
d %>% group_by(group) %>% mutate(valueSum=ifelse(flag==0 & group==id,sum(value),0))

  group    id  flag value valueSum
  <chr> <chr> <int> <int>    <dbl>
1     a     a     0     5        7
2     a     c     0     2        0
3     b     b     0     8        9
4     b     e     1     1        0

Upvotes: 1

be_green
be_green

Reputation: 753

This is quite simple with the data.table package. Here is my solution:

library(data.table)

temp <- data.table(group = c("a","a","b","b"),
           id = c("a","c","b","d"),
           flag = c(0,0,0,1),
           value = c(5,2,8,1))

temp[flag != 1,ValueSum := ifelse(group == id,sum(value), NA), by = group]

Before the first comma, you are able to specify a logical condition. In the middle you can do column assignment (I'm sure there's a cleaner solution in this case, but it works). After the second comma, you can specify a group-by statement. I would highly recommend getting familiar with the package, it has made my life far easier.

Upvotes: 1

Related Questions