Repeated rows In R with different sequence

Question

I managed to get my raw data in this form(in R) i.e for each combination of products(combination of 3) their respective count, but as you can see there is duplication happening, there are rows with same product but different ordering of those products, I needed a way to combine those rows irrespective of the order and add the sum(num) of those to get the combined sum. This is just a part of the entire data set. Help me to figure out a way to do this.

             pages               sum(num)
Badezimmer,Baumarkt,Büromöbel          6
Badezimmer,Baumarkt,Dekoration        14
Badezimmer,Baumarkt,Flur              30
Badezimmer,Baumarkt,Garten            18
Badezimmer,Baumarkt,Heimtextilien    100
Badezimmer,Baumarkt,Kinder            28
Badezimmer,Büromöbel,Baumarkt         16
Badezimmer,Flur,Baumarkt              40

akrun · Accepted Answer

Here is an option using cSplit from library(splitstackshape). We convert the 'data.frame' to 'data.table', create a row id column 'rn' with the option keep.rownames=TRUE, split the 'pages' column by ',' and convert to 'long' format with cSplit. Grouped by 'rn', we sort the 'pages' and paste it together, also we get the first value of 'sum.num.'. Then, we get the sum of 'sum.num.' grouped by 'pages'.

library(splitstackshape)
library(data.table)
cSplit(setDT(df1, keep.rownames=TRUE), 'pages', ',', 'long')[,
   list(pages=toString(sort(pages)), sum.num.= sum.num.[1]) ,rn
   ][,list(Sum=sum(sum.num.)) , .(pages)]
 #                                 pages Sum
#1:     Badezimmer, Baumarkt, Büromöbel  22
#2:    Badezimmer, Baumarkt, Dekoration  14
#3:          Badezimmer, Baumarkt, Flur  70
#4:        Badezimmer, Baumarkt, Garten  18
#5: Badezimmer, Baumarkt, Heimtextilien 100
#6:        Badezimmer, Baumarkt, Kinder  28

NOTE: 'df1' from @RHertel's post.

Repeated rows In R with different sequence

Answers (2)

Related Questions