how to find the intersections of two files in python

Question

I have two big files like the following:

file1

108 a  107          a 6    1

110 a  109          a 50   0

121 a  120          a 107  0

200 a  198          a 108  0

202 a  200          a 109  1

                    a 120  0

                    a 130  1

I want to find the intersections based on the second and third column of file1, and the first and second columns of file2, and then print the whole intersection line of file1 and the third column of file2:

output:

108 a 107 0

110 a 109 1

121 a 120 0

I know how to find the intersections of two files based on the whole line. Any clue on how to implement for my case in python? Thank you!

RomanPerekhrest · Accepted Answer

With pandas.DataFrame.merge function (as one-line solution):

In [122]: df1 = pd.read_table('file1', delim_whitespace=True, header=None)

In [123]: df1
Out[123]: 
     0  1    2
0  108  a  107
1  110  a  109
2  121  a  120
3  200  a  198
4  202  a  200

In [124]: df2 = pd.read_table('file2', delim_whitespace=True, header=None)

In [125]: df2
Out[125]: 
   0    1  2
0  a    6  1
1  a   50  0
2  a  107  0
3  a  108  0
4  a  109  1
5  a  120  0
6  a  130  1

In [126]: print(df1.merge(df2, left_on=[1,2], right_on=[0,1]).loc[:, ['0_x','1_x','2_x','2_y']].to_string(header=False, index=False))
108  a  107  0
110  a  109  1
121  a  120  0

To dump the result to csv file:

df1.merge(df2, left_on=[1,2], right_on=[0,1]).loc[:, ['0_x','1_x','2_x','2_y']].to_csv('result.csv', sep='	', header=False, index=False)

how to find the intersections of two files in python

Answers (1)

Related Questions