Reputation: 823
I have this data frame:
> head(DF, 10)
DATE USER CATEGORY QTY
1 2017-09-04 A79 Footwear 2167
2 2017-08-31 A41 Accessories 342
3 2017-08-27 A34 Accessories 828
4 2017-08-22 A68 Accessories 1292
5 2017-08-23 A68 Accessories 1297
6 2017-08-23 A68 Footwear 1944
7 2017-08-25 A68 Accessories 60
8 2017-08-25 A68 Footwear 5
9 2017-08-25 A68 Apparel 2454
10 2017-08-29 A68 Accessories 2521
What I want is this:
> head(DF1, 10)
DATE USER CATEGORIES QTY_SUM
1 2017-09-04 A79 Footwear 2167
2 2017-08-31 A41 Accessories 342
3 2017-08-27 A34 Accessories 828
4 2017-08-22 A68 Accessories 1292
5 2017-08-23 A68 Accessories-1297, Footwear-1944 3241
6 2017-08-25 A68 Accessories-60, Footwear-5, Apparel-2454 2519
7 2017-08-29 A68 Accessories 2521
I tried using aggregate
, does not work well. I think this might be possible with something similar to this:
DF1 <- data.table(DF, key=c('DATE', 'USER_ID'))
DF1 <- DF1[, list(CATEGORIES=paste0(CATEGORY, "-", QTY), QTY=sum(QTY)), by=c('DATE', 'USER_ID')]
> head(x, 10) #getting this
DATE USER CATEGORY QTY
1 2017-09-04 A79 Footwear-2167 2167
2 2017-08-31 A41 Accessories-342 342
3 2017-08-27 A34 Accessories-828 828
4 2017-08-22 A68 Accessories-1292 1292
5 2017-08-23 A68 Accessories-1297 1297
6 2017-08-23 A68 Footwear-1944 1944
7 2017-08-25 A68 Accessories-60 60
8 2017-08-25 A68 Footwear-5 5
9 2017-08-25 A68 Apparel-2454 2454
10 2017-08-29 A68 Accessories 2521
What am I doing wrong? Please suggest if there are better ways to do this.
Upvotes: 1
Views: 38
Reputation: 7443
Using dplyr,
, you can:
df <- read.table(text="
DATE USER CATEGORY QTY
1 2017-09-04 A79 Footwear 2167
2 2017-08-31 A41 Accessories 342
3 2017-08-27 A34 Accessories 828
4 2017-08-22 A68 Accessories 1292
5 2017-08-23 A68 Accessories 1297
6 2017-08-23 A68 Footwear 1944
7 2017-08-25 A68 Accessories 60
8 2017-08-25 A68 Footwear 5
9 2017-08-25 A68 Apparel 2454
10 2017-08-29 A68 Accessories 2521")
library(dplyr)
We first group_by
DATE and USER (I guess), then you paste each item in CATEGORY with some decoration. Finally, you ungroup your data.frame
(tibble
here but that remains a data.frame
):
df %>%
group_by(DATE, USER) %>%
summarise(CATEGORIES=paste(CATEGORY, QTY, sep="-", collapse=","),
QTY_SUM=sum(QTY)) %>%
ungroup()
# A tibble: 7 x 4
DATE USER CATEGORIES QTY_SUM
<fctr> <fctr> <chr> <int>
1 2017-08-22 A68 Accessories-1292 1292
2 2017-08-23 A68 Accessories-1297,Footwear-1944 3241
3 2017-08-25 A68 Accessories-60,Footwear-5,Apparel-2454 2519
4 2017-08-27 A34 Accessories-828 828
5 2017-08-29 A68 Accessories-2521 2521
6 2017-08-31 A41 Accessories-342 342
7 2017-09-04 A79 Footwear-2167 2167
Is this what you want?
Upvotes: 4