Anubhav Dikshit
Anubhav Dikshit

Reputation: 1829

Find overlapping elements in a dataframe

I have a dataframe 'df' with the following structure:

Assume 4 distinct stores and titles exist

Title Store
T1    S1
T1    S2
T1    S3
T1    S4
T2    S1
T2    S2
T2    S4
T3    S1
T3    S4
T4    S1
T4    S2

Problem:

I would like to find common stores for all combination of the titles

Expected Output:

Title_combination     Common_Store      
T1,T2,T3,T4           S1     
T1,T2,T3              S1,S4
T1,T2,T4              S1,S2
........             ...... so on

Upvotes: 1

Views: 59

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

Using base functions. Explanation inline.

data:

tbl <- read.table(text="Title Store
T1    S1
T1    S2
T1    S3
T1    S4
T2    S1
T2    S2
T2    S4
T3    S1
T3    S4
T4    S1
T4    S2", header=TRUE)

workings:

#get unique titles
titles <- unique(tbl$Title)

#combine rows into a single data.frame
do.call(rbind, unlist(
    #for each set of n titles
    lapply(seq_along(titles), function(n)
        #using combn to generate combi and apply function to each combi
        combn(titles, n, function(subtitles) {
            #recursively intersect all stores for each title within the set subtitles 
            cstores <- Reduce(function(s, t2) intersect(s, tbl$Store[tbl$Title==t2]), 
                subtitles[-1], 
                tbl$Store[tbl$Title==subtitles[1]])
            data.frame(
                Title_combi=paste(subtitles, collapse=","),
                Common_Store=paste(cstores, collapse=",")
            )
        }, simplify=FALSE) #dont simplify results from combn
    ), 
    recursive=FALSE)) #unlist 1 level of combi results

results:

#    Title_combi Common_Store
# 1           T1  S1,S2,S3,S4
# 2           T2     S1,S2,S4
# 3           T3        S1,S4
# 4           T4        S1,S2
# 5        T1,T2     S1,S2,S4
# 6        T1,T3        S1,S4
# 7        T1,T4        S1,S2
# 8        T2,T3        S1,S4
# 9        T2,T4        S1,S2
# 10       T3,T4           S1
# 11    T1,T2,T3        S1,S4
# 12    T1,T2,T4        S1,S2
# 13    T1,T3,T4           S1
# 14    T2,T3,T4           S1
# 15 T1,T2,T3,T4           S1

Upvotes: 1

Related Questions