Reputation: 415
I have two big files like the following:
file1
108 a 107 a 6 1
110 a 109 a 50 0
121 a 120 a 107 0
200 a 198 a 108 0
202 a 200 a 109 1
a 120 0
a 130 1
I want to find the intersections based on the second and third column of file1, and the first and second columns of file2, and then print the whole intersection line of file1 and the third column of file2:
output:
108 a 107 0
110 a 109 1
121 a 120 0
I know how to find the intersections of two files based on the whole line. Any clue on how to implement for my case in python? Thank you!
Upvotes: 0
Views: 402
Reputation: 92854
With pandas.DataFrame.merge
function (as one-line solution):
In [122]: df1 = pd.read_table('file1', delim_whitespace=True, header=None)
In [123]: df1
Out[123]:
0 1 2
0 108 a 107
1 110 a 109
2 121 a 120
3 200 a 198
4 202 a 200
In [124]: df2 = pd.read_table('file2', delim_whitespace=True, header=None)
In [125]: df2
Out[125]:
0 1 2
0 a 6 1
1 a 50 0
2 a 107 0
3 a 108 0
4 a 109 1
5 a 120 0
6 a 130 1
In [126]: print(df1.merge(df2, left_on=[1,2], right_on=[0,1]).loc[:, ['0_x','1_x','2_x','2_y']].to_string(header=False, index=False))
108 a 107 0
110 a 109 1
121 a 120 0
To dump the result to csv file:
df1.merge(df2, left_on=[1,2], right_on=[0,1]).loc[:, ['0_x','1_x','2_x','2_y']].to_csv('result.csv', sep='\t', header=False, index=False)
Upvotes: 2