Intersection between multiple files

Question

I have multiple files (say3). Each file has only one column. Looks like following:

File A

America
Russia
China
UK

File B

India
China
Russia

File C

China
America
Russia
Iran

Now, for computing the intersection or say to get the common elements in all files, I do

python -c 'import sys;print "".join(sorted(set.intersection(*[set(open(a).readlines()) for a in sys.argv[1:]])))' File1 File2 File3 File4

But, if I also need to know the pairwise overlap between these file, how can I loop the process? so that I get a set of elements that are present in all of them and also the elements that are present in A&B, A&C, c&B.

Help in python will be appreciated.

Kindly help

Ashwini Chaudhary · Accepted Answer

To get lines that are common to all files you can use:

for f in sys.argv[1:]:
    data = []
    with open(f) as inp:
           lines = set(line.rstrip() for line in  inp)
           data.append(lines)
    common_lines = data[0].intersection(*data[1:])

For the second part use itertools.combinations:

from itertools import combinations

for f1, f2 in combinations(sys.argv[1:], 2):
    with open(f1) as inp1, open(f2) as inp2:
        print set(line.rstrip() for line in inp1).intersection(map(str.rstrip,
                                                                           inp2))

Intersection between multiple files

Answers (2)

Related Questions