Reputation: 63
I'm using python 3.0 in windows.
i have 2 csv files
file1.csv
0, 10,12
0,11,12
1,15,12
2, 17,12
file2.csv
0, 2, 1
1,22, 1
3, 11, 1
output.csv
0, 10,12
0,11,12
0, 2, 1
1,15,12
1,22, 1
2, 17,12
3, 11, 1
i tried the following code
a = pd.read_csv('file1.csv')
b = pd.read_csv('file2.csv')
c = pd.concat([a, b], join='outer')
c.to_csv("Output.csv", index = False)
But my output is
0, 10,12
0,11,12
1,15,12
2, 17,12
0, 2, 1
1,22, 1
3, 11, 1
Can you give me some pointers please. I'm new to python.
Upvotes: 0
Views: 1499
Reputation: 14665
This does not create the output file but it demonstrates how heapq.merge
could help:
from heapq import merge
inputs = [file(f) for f in ['file1.csv', 'file2.csv']]
for line in merge(*inputs):
print line,
With the sample data this produces
0, 10,12
0, 2, 1
0,11,12
1,15,12
1,22, 1
2, 17,12
3, 11, 1
However this differs from the sample output in the ordering of the initial lines:
0, 10,12
0,11,12
0, 2, 1
but I'm not sure how to produce this ordering. The sample output lines do not appear to be ordered by character or numeric columns (a numeric ordering of the fields would presumably put 0, 2, 1
first).
EDIT: it appears the lines are ordered as if spaces were not present. The following example:
from heapq import merge
def reader(f):
for line in file(f):
yield line.replace(' ',''), line
inputs = [reader(f) for f in ['file1.csv', 'file2.csv']]
for pair in merge(*inputs):
print pair[1],
generates this ordering:
0, 10,12
0,11,12
0, 2, 1
1,15,12
1,22, 1
2, 17,12
3, 11, 1
Upvotes: 0
Reputation: 895
You can concatenate and sort them with pandas:
df = pd.concat([a, b]).astype('str')
df = df.sort_values(list(df.columns))
df.to_csv('Output.csv', index=False)
Upvotes: 1