Reputation: 279
I have a dataset with three columns as below:
data <- data.frame(
grpA = c(1,1,1,1,1,2,2,2),
idB = c(1,1,2,2,3,4,5,6),
valueC = c(10,10,20,20,10,30,40,50),
otherD = c(1,2,3,4,5,6,7,8)
)
valueC
is unique to each unique value of idB
.
I want to use dplyr
pipe (as the rest of my code is in dplyr
) and use group_by
on grpA
to get a new column with sum of valueC
values for each group.
The answer should be like:
newCol <- c(40,40,40,40,40,120,120,120)
but with data %>% group_by(grpA) %>%
mutate(newCol=sum(valueC)
, I get newCol <- c(70,70,70,70,70,120,120,120)
How do I include unique value of idB
? Is there anything else I can use instead of group_by
in dplyr %>%
pipe.
I cant use summarise
as I need to keep values in otherD
intact for later use.
Other option I have is to create newCol
separately through sql
and then merge
with left join
. But I am looking for a better solution inline.
If it has been answered before, please refer me to the link as I could not find any relevant answer to this issue.
Upvotes: 2
Views: 580
Reputation: 887851
We need unique
with match
data %>%
group_by(grpA) %>%
mutate(ind = sum(valueC[match(unique(idB), idB)]))
# A tibble: 8 x 5
# Groups: grpA [2]
# grpA idB valueC otherD ind
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 10 1 40
#2 1 1 10 2 40
#3 1 2 20 3 40
#4 1 2 20 4 40
#5 1 3 10 5 40
#6 2 4 30 6 120
#7 2 5 40 7 120
#8 2 6 50 8 120
Or another option is to get the distinct
rows by 'grpA', 'idB', grouped by 'grpA', get the sum
of 'valueC' and left_join
with the original data
data %>%
distinct(grpA, idB, .keep_all = TRUE) %>%
group_by(grpA) %>%
summarise(newCol = sum(valueC)) %>%
left_join(data, ., by = 'grpA')
Upvotes: 2