Reputation: 647
I have a dataset like this
V0 V1 V2 V3 X Y
#1 1 A 21 31 123 12
#2 2 A 21 31 245 24
#3 3 B 22 32 234 25
#4 4 C 23 33 190 30
#5 5 C 23 33 210 20
So there are duplicated value in V1, V2 and V3; and I would like to create a dataset like this, which sum up X and Y respectively, by V1-V3
V1 V2 V3 X Y V
#1 A 21 31 368 36 1,2
#2 B 22 32 234 25 3
#3 C 23 33 400 50 4,5
I was trying "aggregate" as below, but have no idea how to do it with both X and Y together, and at the same time, keep V2 and V3 as the original value, instead of sum them. And how can I put the value in V0 within the same group (V1) in another new variable.
df.sum <- aggregate(X~V1,data=df,FUN=sum)
I try to merge the "df.sum" with "df" by "V1" afterwards, but it turns out that all the duplicated value were merged as well.
Any suggestion? Thank you very much!
Upvotes: 1
Views: 2148
Reputation: 3501
Or by dplyr
library(dplyr)
df %>% group_by(V1,V2,V3) %>% summarise(X_sum=sum(X), Y_sum= sum(Y))
# Or as suggested, you could also do:
df %>% group_by(V1,V2,V3) %>% summarise_each(funs(sum))
#Source: local data frame [3 x 5]
#Groups: V1, V2
#
# V1 V2 V3 X_sum Y_sum
#1 A 21 31 368 36
#2 B 22 32 234 25
#3 C 23 33 400 50
# data
df <- structure(list(V1 = structure(c(1L, 1L, 2L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), V2 = c(21L, 21L, 22L, 23L, 23L),
V3 = c(31L, 31L, 32L, 33L, 33L), X = c(123L, 245L, 234L,
190L, 210L), Y = c(12L, 24L, 25L, 30L, 20L)), .Names = c("V1",
"V2", "V3", "X", "Y"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
Regarding the updated data, you could do something like:
df %>% group_by(V1,V2,V3) %>%
summarise_each(funs(sum, toString), X, Y, V0) %>%
select(-V0_sum,
-X_toString,
-Y_toString)
# you get
# V1 V2 V3 X_sum Y_sum V0_toString
# 1 A 21 31 368 36 1, 2
# 2 B 22 32 234 25 3
# 3 C 23 33 400 50 4, 5
# data
df <- structure(list(V0 = 1:5, V1 = structure(c(1L, 1L, 2L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), V2 = c(21L, 21L, 22L, 23L, 23L),
V3 = c(31L, 31L, 32L, 33L, 33L), X = c(123L, 245L, 234L,
190L, 210L), Y = c(12L, 24L, 25L, 30L, 20L)), .Names = c("V0",
"V1", "V2", "V3", "X", "Y"), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 4
Reputation: 193677
You're on the right track. Just do:
aggregate(. ~ V1 + V2 + V2, mydf, sum)
# V1 V2 V3 X Y
# 1 A 21 62 368 36
# 2 B 22 32 234 25
# 3 C 23 66 400 50
You can also do this in many other ways. For example, here's an approach using "data.table":
library(data.table)
as.data.table(mydf)[, lapply(.SD, sum), by = list(V1, V2, V3)]
Upvotes: 3