jing
jing

Reputation: 415

how to find the intersections of two files in python

I have two big files like the following:

file1

108 a  107          a 6    1

110 a  109          a 50   0

121 a  120          a 107  0

200 a  198          a 108  0

202 a  200          a 109  1

                    a 120  0

                    a 130  1

I want to find the intersections based on the second and third column of file1, and the first and second columns of file2, and then print the whole intersection line of file1 and the third column of file2:

output:

108 a 107 0

110 a 109 1

121 a 120 0

I know how to find the intersections of two files based on the whole line. Any clue on how to implement for my case in python? Thank you!

Upvotes: 0

Views: 402

Answers (1)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

With pandas.DataFrame.merge function (as one-line solution):

In [122]: df1 = pd.read_table('file1', delim_whitespace=True, header=None)

In [123]: df1
Out[123]: 
     0  1    2
0  108  a  107
1  110  a  109
2  121  a  120
3  200  a  198
4  202  a  200

In [124]: df2 = pd.read_table('file2', delim_whitespace=True, header=None)

In [125]: df2
Out[125]: 
   0    1  2
0  a    6  1
1  a   50  0
2  a  107  0
3  a  108  0
4  a  109  1
5  a  120  0
6  a  130  1

In [126]: print(df1.merge(df2, left_on=[1,2], right_on=[0,1]).loc[:, ['0_x','1_x','2_x','2_y']].to_string(header=False, index=False))
108  a  107  0
110  a  109  1
121  a  120  0

To dump the result to csv file:

df1.merge(df2, left_on=[1,2], right_on=[0,1]).loc[:, ['0_x','1_x','2_x','2_y']].to_csv('result.csv', sep='\t', header=False, index=False)

Upvotes: 2

Related Questions