Reputation: 1484
I want to sum all columns except one specific column based on the condition by groups.
For example:
Col1 Col2 Condition Name P1 P2 P3 P4
1990 1 0 APPLE 10 20 20 30
1990 1 1 BAN 30 40 50 50
1990 1 1 CAR 40 40 30 40
1990 2 0 DOG 100 20 30 40
1990 2 1 APPLE 10 20 20 30
1990 2 1 APPLE 50 20 20 30
I want to SUM P2
, P3
, P4
and then APPEND P1
when Condition
equals to "0"
by Col1
and Col2
.
So the result will be:
Col1 Col2 Condition P1 P2 P3 P4
1990 1 0 10 100 100 120
1990 2 0 100 60 70 100
I know how to add in data.table
but have no idea with this.
DT[, .(lapply(.SD, sum, na.rm=TRUE), by=.(Col1, Col2), .SDcols=c("P2", "P3", "P4")]
It seems that DT[, setdiff(names(DT), c("P2", "P3", "P4")), with = FALSE]
is a key but still have no idea.
Upvotes: 1
Views: 746
Reputation: 886938
We can group by 'Col1', 'Col2', then summarise_at
the columns that starts_with
'P' (changed to range of columns as per OP's request) to get the sum
of values where 'Condition' is 0 and then create a 'Condition' column of 0
library(dplyr)
df1 %>%
group_by(Col1, Col2) %>%
group_by(P1 = sum(P1[Condition == 0]), add = TRUE) %>%
summarise_at(vars(P2:P4), sum) %>%
mutate(Condition = 0) %>%
ungroup
# A tibble: 2 x 7
# Col1 Col2 P1 P2 P3 P4 Condition
# <int> <int> <int> <int> <int> <int> <dbl>
#1 1990 1 10 100 100 120 0
#2 1990 2 100 60 70 100 0
Or using data.table
library(data.table)
setDT(df1)[, P1 := sum(P1[Condition == 0]), .(Col1, Col2)][,
lapply(.SD, sum) , .(Col1, Col2, P1), .SDcols = P2:P4]
Upvotes: 2