Reputation: 767
How to append n csv files preserving column order?
Column headers might differ in each file, so that some columns might be missing or some new columns might be appearing in the consecutive files.
For example
File1
Column2 Column1 Column4
1 1 Text1
1 1 Text1
File2
Column2 Column1 Column3 Column4 Column5
2 2 2 Text2 xxx
2 2 2 Text2 xxx
File3
Column2 Column1 Column3 Column4
3 3 3 Text3
3 3 3 Text3
Desired Output
Column2 Column1 Column4 Column3 Column5
1 1 Text1
1 1 Text1
2 2 Text2 2 xxx
2 2 Text2 2 xxx
3 3 Text3 3
3 3 Text3 3
I am trying to use pandas but the columns in the final output get sorted alphabetically. Is there any way to avoid the alphabetical sorting / control the column order?
import pandas as pd
import glob
files = glob.glob("C:\CSVs\*.csv")
df_list = []
for filename in sorted(files):
df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list, ignore_index= True)
full_df.to_csv('output3.csv')
I am getting
Column1 Column2 Column3 Column4 Column5
...
Upvotes: 1
Views: 126
Reputation: 6556
Try this way out, it will work dynamically according to the header of files:
import pandas as pd
import glob
files = glob.glob("./CSVs/*.csv")
df_list = []
col_list=[]
for filename in sorted(files):
df = pd.read_csv(filename)
df_list.append(df)
for e in list(df.columns):
if e not in new_col_list:
col_list.append(e)
full_df = pd.concat(df_list)
full_df = full_df[col_list]
full_df
Output:
Column2 Column1 Column4 Column3 Column5
0 1 1 Text1 NaN NaN
1 1 1 Text1 NaN NaN
0 2 2 Text2 2 xxx
1 2 2 Text2 2 xxx
0 3 3 Text3 3 NaN
1 3 3 Text3 3 NaN
Upvotes: 1
Reputation: 2927
>>> df = file1.append(file2).append(file3)
>>> df.reset_index(inplace=True)
>>> df.reindex(columns=['Column2', 'Column1', 'Column4', 'Column3', 'Column5'])
Column2 Column1 Column4 Column3 Column5
0 1 1 Text1 NaN NaN
1 1 1 Text1 NaN NaN
2 2 2 Text2 2.0 xxx
3 2 2 Text2 2.0 xxx
4 3 3 Text3 3.0 NaN
5 3 3 Text3 3.0 NaN
Upvotes: 0
Reputation: 2245
If you just want to set the final column order, you can try this:
full_df = full_df.reindex_axis(['Column2', 'Column1', 'Column4', 'Column3', 'Column5'], axis=1)
full_df.to_csv('output3.csv')
Upvotes: 0
Reputation: 905
import pandas as pd
import glob
import itertools
files = glob.glob("C:\CSVs\*.csv")
df_list = []
new_col_list = []
for filename in sorted(files):
x = pd.read_csv(filename)
df_list.append(x)
new_col_list.append(x.columns)
full_df = pd.concat(df_list, ignore_index= True)
new_col_list_merged = list(itertools.chain.from_iterable(new_col_list))
full_df_updated = full_df[new_col_list_merged ]
full_df.to_csv('output3.csv')
Hope this should work
Upvotes: 0