Reputation: 910
I want to aggregate rows (and sum up values) in the following data example:
df <- data.frame(from=c("A" ,"A", "A", "C", "C", "D", "A"),
to=c("B", "B", "B", "A", "A", "B", "D"),
values=c(5,6,2,10,2,6,3),
product=c("x","x", "x", "y", "z", "w", "w"),
year=c(1990,1991,1991,1990,1990,1991,1992))
> df
from to values product year
1 A B 5 x 1990
2 A B 6 x 1991
3 A B 2 x 1991
4 C A 10 y 1990
5 C A 2 z 1990
6 D B 6 w 1991
7 A D 3 w 1992
All rows containing same values/characters for the columns from
, to
, product
and year
should be aggregated to one row and the values in the values
column should be summed up.
I tried the following code:
aggregate(values~from+to+product+year, df, FUN=sum)
and
ddply(df_id, c("from", "to", "product", "year"), numcolwise(sum))
these codes worked well. However, both changed the order of rows (also columns which is less important) see below:
for aggregate:
from to product year values
1 A B x 1990 5
2 C A y 1990 10
3 C A z 1990 2
4 D B w 1991 6
5 A B x 1991 8
6 A D w 1992 3
and for ddply:
from to product year values
1 C A y 1990 10
2 C A z 1990 2
3 A B x 1990 5
4 A B x 1991 8
5 A D w 1992 3
6 D B w 1991 6
the expected result should look like:
from to values product year
1 A B 5 x 1990
2 A B 8 x 1991
3 C A 10 y 1990
4 C A 2 z 1990
5 D B 6 w 1991
6 A D 3 w 1992
Any ideas how to solve this order problem (at least for rows)? Thanks
Upvotes: 2
Views: 884
Reputation: 66819
The data.table package retains original by-group ordering by default:
library(data.table)
setDT(df)[, .(v = sum(values)), by=.(from,to,product,year)]
# from to product year v
# 1: A B x 1990 5
# 2: A B x 1991 8
# 3: C A y 1990 10
# 4: C A z 1990 2
# 5: D B w 1991 6
# 6: A D w 1992 3
Only with keyby=
instead of by=
would it sort groups.
Upvotes: 4
Reputation: 887541
Here is one way with dplyr
. We replace the 'values' by the sum
after grouping by the variables and get the distinct
rows.
library(dplyr)
df %>%
group_by(from, to, product, year) %>%
mutate(values=sum(values)) %>%
distinct
from to values product year
# (fctr) (fctr) (dbl) (fctr) (dbl)
#1 A B 5 x 1990
#2 A B 8 x 1991
#3 C A 10 y 1990
#4 C A 2 z 1990
#5 D B 6 w 1991
#6 A D 3 w 1992
EDIT: replaced unique
with distinct
based on @Steven Beaupre's comment.
Upvotes: 2