Chenlu
Chenlu

Reputation: 459

how to concatenate two csv files and keep the original order of columns in python?

Actually there are many parts in this question. I have solved some parts by viewing the answer of other stack overflow questions. However, there is still one tiny problem not solved. The output csv file tends to order alphabetically or in other order, which is not what I want.

As an example, I'm going to concatenate two csv files vertically. The two csv look like the following.

    a.csv
    B, A, C, E
    1, 1, 1, 1

    b.csv
    B, A, D, C
    2, 2, 2, 2

The result I'd like to get is

    c.csv
    B, A, D, C, E
    1, 1,  , 1, 1
    2, 2, 2, 2,   

First, I read them into pandas data frames.

    a = pd.read_csv("a.csv")
    b = pd.read_csv("b.csv")

Then concatenate them and write to csv by

    c = pd.concat([a, b], join='outer')
    c.to_csv("c.csv", index=False)

The output csv looks like

    c.csv
    A, C, D, B, E
    1, 1,  , 1, 1
    2, 2, 2,  , 2 

Is there any way to solve the problem? I once thought of something like the code from the answer Preserving column order in Python Pandas DataFrame

    df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')

However, there are hundreds of columns in my csv file, I can't manually write down the order of column names. And for each group of files, the column names are different. I tried

    set(a.columns.values).union(list(b.columns.values))

It also doesn't work, because set will disorder the list.

Upvotes: 2

Views: 3882

Answers (2)

Jon Clements
Jon Clements

Reputation: 142206

Build up an output order which you can then supply to c.to_csv(...), eg:

from collections import OrderedDict
out_order = OrderedDict.fromkeys(a.columns)
out_order.update(OrderedDict.fromkeys(b.columns))
out_order = list(out_order)
# ['B', 'A', 'C', 'E', 'D']

c.to_csv("c.csv", index=False, columns=out_order)

Upvotes: 1

rwester
rwester

Reputation: 164

You almost have it with a.columns

col_names = a.columns.tolist() # list of column names
sorted_cols = sorted(col_names)

df.to_csv("dfTest.txt","\t",header=True,cols=sorted_cols, engine='python')

In one line:

df.to_csv("dfTest.txt","\t",
          header=True,
          cols=sorted(a.columns.tolist()),
          engine='python')

Upvotes: 0

Related Questions