Reputation: 29
I have some csv files that they have different columns , I should merge this files into one file, here is my code:
import os, glob
import pandas as pd
path = ""
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pd.read_csv(f, sep=',') for f in all_files)
df_merged = pd.concat(df_from_each_file, ignore_index=True, axis=1)
df_merged.to_csv( "merged.csv")
This code indicates the columns by numbers not their names! What should I do for saving columns names in merged file too?
Thanks for your helps
Upvotes: 0
Views: 68
Reputation: 4023
This sounds like a direct implementation of one of the Pandas examples for concat(). Copying the relevant example from their documentation:
>>> df1 = pd.DataFrame([['a', 1], ['b', 2]],
columns=['letter', 'number'])
>>> df1
letter number
0 a 1
1 b 2
>>> df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']],
columns=['letter', 'number', 'animal'])
>>> df3
letter number animal
0 c 3 cat
1 d 4 dog
>>> pd.concat([df1, df3], sort=False)
letter number animal
0 a 1 NaN
1 b 2 NaN
0 c 3 cat
1 d 4 dog
I usually like to call df.reset_index()
on the resulting Dataframe df
as well, since having duplicate values in the index can cause unexpected behavior. If you're about to do a join on one of the columns, though, it won't matter.... although you've already got ignore_index=True
in your sample code, so you should be fine.
Upvotes: 1