PROC SQL in SAS - All Pairs of Items

Question

I have a dataset in which I need to look at all pairs of items that are together from within another group. I've created a toy example below to further explain.

BUNCH    FRUITS
1        apples
1        bananas
1        mangos
2        apples
3        bananas
3        apples
4        bananas
4        apples

What I want is a listing of all possible pairs and sum the frequency they occur together within a bunch. My output would ideally look like this:

FRUIT1    FRUIT2     FREQUENCY
APPLES    BANANAS    3
APPLES    MANGOS     1

My end goal is to make something that I'll eventually be able to import into Gephi for a network analysis. For this I need a Source and Target column (aka FRUIT1 and FRUIT2 above).

I think there are a few other ways to approach this as well without using PROC SQL (Maybe using PROC TRANSPOSE) but this is where I've started.

SOLUTION

Thanks for the help. Sample code below for anyone interested in something similar:

proc sql;
    create table fruit_combo as
    select a.FRUIT as FRUIT1, b.FRUIT as FRUIT2, count(*) as FREQUENCY
    from FRUITS a, FRUITS b
    where a.BUNCH=b.BUNCH and and not a.FRUIT= b.FRUIT
    group by FRUIT1, FRUIT2;
    quit;

Joe · Accepted Answer

Simplest approach is to do a cartesian (full) join of the table to itself, on t1.ID=t2.ID and t1.FRUIT ne t2.FRUIT. That will generate the full combination set, which you could then summarize.

PROC SQL in SAS - All Pairs of Items

Answers (2)

Related Questions