Arani
Arani

Reputation: 823

R - Using list() to aggregate data frame

I have this data frame:

> head(DF, 10)
         DATE    USER    CATEGORY     QTY
1  2017-09-04     A79 Footwear       2167
2  2017-08-31     A41 Accessories     342
3  2017-08-27     A34 Accessories     828
4  2017-08-22     A68 Accessories    1292
5  2017-08-23     A68 Accessories    1297
6  2017-08-23     A68 Footwear       1944
7  2017-08-25     A68 Accessories      60
8  2017-08-25     A68 Footwear          5
9  2017-08-25     A68 Apparel        2454
10 2017-08-29     A68 Accessories    2521

What I want is this:

> head(DF1, 10)
         DATE    USER                               CATEGORIES   QTY_SUM
1  2017-09-04     A79 Footwear                                      2167
2  2017-08-31     A41 Accessories                                    342
3  2017-08-27     A34 Accessories                                    828
4  2017-08-22     A68 Accessories                                   1292
5  2017-08-23     A68 Accessories-1297, Footwear-1944               3241
6  2017-08-25     A68 Accessories-60, Footwear-5, Apparel-2454      2519
7  2017-08-29     A68 Accessories                                   2521

I tried using aggregate, does not work well. I think this might be possible with something similar to this:

DF1 <- data.table(DF, key=c('DATE', 'USER_ID'))
DF1 <- DF1[, list(CATEGORIES=paste0(CATEGORY, "-", QTY), QTY=sum(QTY)), by=c('DATE', 'USER_ID')]
> head(x, 10) #getting this
         DATE    USER         CATEGORY     QTY
1  2017-09-04     A79 Footwear-2167       2167
2  2017-08-31     A41 Accessories-342      342
3  2017-08-27     A34 Accessories-828      828
4  2017-08-22     A68 Accessories-1292    1292
5  2017-08-23     A68 Accessories-1297    1297
6  2017-08-23     A68 Footwear-1944       1944
7  2017-08-25     A68 Accessories-60        60
8  2017-08-25     A68 Footwear-5             5
9  2017-08-25     A68 Apparel-2454        2454
10 2017-08-29     A68 Accessories         2521

What am I doing wrong? Please suggest if there are better ways to do this.

Upvotes: 1

Views: 38

Answers (1)

Vincent Bonhomme
Vincent Bonhomme

Reputation: 7443

Using dplyr,, you can:

df <- read.table(text="
DATE    USER    CATEGORY     QTY
1  2017-09-04     A79 Footwear       2167
2  2017-08-31     A41 Accessories     342
3  2017-08-27     A34 Accessories     828
4  2017-08-22     A68 Accessories    1292
5  2017-08-23     A68 Accessories    1297
6  2017-08-23     A68 Footwear       1944
7  2017-08-25     A68 Accessories      60
8  2017-08-25     A68 Footwear          5
9  2017-08-25     A68 Apparel        2454
10 2017-08-29     A68 Accessories    2521")

library(dplyr)

We first group_by DATE and USER (I guess), then you paste each item in CATEGORY with some decoration. Finally, you ungroup your data.frame (tibble here but that remains a data.frame):

df %>% 
  group_by(DATE, USER) %>% 
  summarise(CATEGORIES=paste(CATEGORY, QTY, sep="-", collapse=","),
            QTY_SUM=sum(QTY)) %>% 
  ungroup()

# A tibble: 7 x 4
DATE   USER                             CATEGORIES QTY_SUM
<fctr> <fctr>                                  <chr>   <int>
  1 2017-08-22    A68                       Accessories-1292    1292
2 2017-08-23    A68         Accessories-1297,Footwear-1944    3241
3 2017-08-25    A68 Accessories-60,Footwear-5,Apparel-2454    2519
4 2017-08-27    A34                        Accessories-828     828
5 2017-08-29    A68                       Accessories-2521    2521
6 2017-08-31    A41                        Accessories-342     342
7 2017-09-04    A79                          Footwear-2167    2167

Is this what you want?

Upvotes: 4

Related Questions