Ginson
Ginson

Reputation: 63

How to merge two csv files line by line

I'm using python 3.0 in windows.

i have 2 csv files

file1.csv

0, 10,12
0,11,12
1,15,12
2, 17,12

file2.csv

0, 2, 1
1,22, 1
3, 11, 1

output.csv

0, 10,12
0,11,12
0, 2, 1
1,15,12
1,22, 1
2, 17,12
3, 11, 1

i tried the following code

a = pd.read_csv('file1.csv')
b = pd.read_csv('file2.csv')
c = pd.concat([a, b], join='outer')
c.to_csv("Output.csv", index = False)

But my output is

0, 10,12
0,11,12
1,15,12
2, 17,12
0, 2, 1
1,22, 1
3, 11, 1

Can you give me some pointers please. I'm new to python.

Upvotes: 0

Views: 1499

Answers (2)

jq170727
jq170727

Reputation: 14665

This does not create the output file but it demonstrates how heapq.merge could help:

from heapq import merge
inputs = [file(f) for f in ['file1.csv', 'file2.csv']]
for line in merge(*inputs):
    print line,

With the sample data this produces

0, 10,12
0, 2, 1
0,11,12
1,15,12
1,22, 1
2, 17,12
3, 11, 1

However this differs from the sample output in the ordering of the initial lines:

0, 10,12
0,11,12
0, 2, 1

but I'm not sure how to produce this ordering. The sample output lines do not appear to be ordered by character or numeric columns (a numeric ordering of the fields would presumably put 0, 2, 1 first).

EDIT: it appears the lines are ordered as if spaces were not present. The following example:

from heapq import merge
def reader(f):
    for line in file(f):
        yield line.replace(' ',''), line
inputs = [reader(f) for f in ['file1.csv', 'file2.csv']]
for pair in merge(*inputs):
    print pair[1],

generates this ordering:

0, 10,12
0,11,12
0, 2, 1
1,15,12
1,22, 1
2, 17,12
3, 11, 1

Upvotes: 0

Evan Nowak
Evan Nowak

Reputation: 895

You can concatenate and sort them with pandas:

df = pd.concat([a, b]).astype('str')

df = df.sort_values(list(df.columns))

df.to_csv('Output.csv', index=False)

Upvotes: 1

Related Questions