Reputation: 459
Actually there are many parts in this question. I have solved some parts by viewing the answer of other stack overflow questions. However, there is still one tiny problem not solved. The output csv file tends to order alphabetically or in other order, which is not what I want.
As an example, I'm going to concatenate two csv files vertically. The two csv look like the following.
a.csv
B, A, C, E
1, 1, 1, 1
b.csv
B, A, D, C
2, 2, 2, 2
The result I'd like to get is
c.csv
B, A, D, C, E
1, 1, , 1, 1
2, 2, 2, 2,
First, I read them into pandas data frames.
a = pd.read_csv("a.csv")
b = pd.read_csv("b.csv")
Then concatenate them and write to csv by
c = pd.concat([a, b], join='outer')
c.to_csv("c.csv", index=False)
The output csv looks like
c.csv
A, C, D, B, E
1, 1, , 1, 1
2, 2, 2, , 2
Is there any way to solve the problem? I once thought of something like the code from the answer Preserving column order in Python Pandas DataFrame
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')
However, there are hundreds of columns in my csv file, I can't manually write down the order of column names. And for each group of files, the column names are different. I tried
set(a.columns.values).union(list(b.columns.values))
It also doesn't work, because set will disorder the list.
Upvotes: 2
Views: 3882
Reputation: 142206
Build up an output order which you can then supply to c.to_csv(...)
, eg:
from collections import OrderedDict
out_order = OrderedDict.fromkeys(a.columns)
out_order.update(OrderedDict.fromkeys(b.columns))
out_order = list(out_order)
# ['B', 'A', 'C', 'E', 'D']
c.to_csv("c.csv", index=False, columns=out_order)
Upvotes: 1
Reputation: 164
You almost have it with a.columns
col_names = a.columns.tolist() # list of column names
sorted_cols = sorted(col_names)
df.to_csv("dfTest.txt","\t",header=True,cols=sorted_cols, engine='python')
In one line:
df.to_csv("dfTest.txt","\t",
header=True,
cols=sorted(a.columns.tolist()),
engine='python')
Upvotes: 0