Chris
Chris

Reputation: 767

python - how to append n csv files preserving the column order

How to append n csv files preserving column order?

Column headers might differ in each file, so that some columns might be missing or some new columns might be appearing in the consecutive files.

For example

File1
Column2 Column1 Column4
1       1       Text1
1       1       Text1

File2
Column2 Column1 Column3 Column4 Column5
2         2        2    Text2   xxx
2         2        2    Text2   xxx

File3
Column2 Column1 Column3 Column4
3              3    3   Text3
3              3    3   Text3

Desired Output

Column2 Column1 Column4 Column3 Column5
1          1    Text1
1          1    Text1
2          2    Text2       2     xxx
2          2    Text2       2     xxx
3          3    Text3       3
3          3    Text3       3

I am trying to use pandas but the columns in the final output get sorted alphabetically. Is there any way to avoid the alphabetical sorting / control the column order?

import pandas as pd
import glob

files = glob.glob("C:\CSVs\*.csv")
df_list = []

for filename in sorted(files):
    df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list, ignore_index= True)

full_df.to_csv('output3.csv')

I am getting

Column1 Column2 Column3 Column4 Column5
...

Upvotes: 1

Views: 126

Answers (4)

Tiny.D
Tiny.D

Reputation: 6556

Try this way out, it will work dynamically according to the header of files:

import pandas as pd
import glob

files = glob.glob("./CSVs/*.csv")
df_list = []
col_list=[]
for filename in sorted(files):
    df = pd.read_csv(filename)
    df_list.append(df)
    for e in list(df.columns):
        if e not in new_col_list:
            col_list.append(e)
full_df = pd.concat(df_list)
full_df = full_df[col_list]
full_df

Output:

    Column2 Column1 Column4 Column3 Column5
0   1       1       Text1   NaN     NaN
1   1       1       Text1   NaN     NaN
0   2       2       Text2   2       xxx
1   2       2       Text2   2       xxx
0   3       3       Text3   3       NaN
1   3       3       Text3   3       NaN

Upvotes: 1

spies006
spies006

Reputation: 2927

>>> df = file1.append(file2).append(file3)
>>> df.reset_index(inplace=True)

>>> df.reindex(columns=['Column2', 'Column1', 'Column4', 'Column3', 'Column5'])
   Column2  Column1 Column4  Column3 Column5
0        1        1   Text1      NaN     NaN
1        1        1   Text1      NaN     NaN
2        2        2   Text2      2.0     xxx
3        2        2   Text2      2.0     xxx
4        3        3   Text3      3.0     NaN
5        3        3   Text3      3.0     NaN

Upvotes: 0

Troy D
Troy D

Reputation: 2245

If you just want to set the final column order, you can try this:

full_df = full_df.reindex_axis(['Column2', 'Column1', 'Column4', 'Column3', 'Column5'], axis=1)
full_df.to_csv('output3.csv')

Upvotes: 0

import pandas as pd
import glob
import itertools

files = glob.glob("C:\CSVs\*.csv")
df_list = []
new_col_list = []
for filename in sorted(files):
    x = pd.read_csv(filename)
    df_list.append(x)
    new_col_list.append(x.columns)
full_df = pd.concat(df_list, ignore_index= True)
new_col_list_merged = list(itertools.chain.from_iterable(new_col_list))
full_df_updated = full_df[new_col_list_merged ]
full_df.to_csv('output3.csv')

Hope this should work

Upvotes: 0

Related Questions