Reputation: 935
Let's take sample data. Imagine I have two csv documents in one directory. When I import them separately, I get these two dataframes :
# first one
Col1 Col2 Col3
0 1 2.0 2.0
1 2 4.8 4.8
2 3 1.0 1.0
# second one
ColA ColB ColC ColD
0 5 9.0 2.9 2.2
1 1 7.8 2.2 9.8
2 2 2.0 7.0 1.6
I would like to import them in a unique dataframe, knowing that I have the following dictionary :
Dict_col_names = {
"Col1" : "ColA",
"Col2" : "ColB",
"Col3" : "ColC"
}
So I can have through this dictionary the correspondence between column names. If a column name is not in the dictionary, I don't want to import it. I know that the basic code to import csv of an entire directory into a single dataframe is the following :
import glob
import os
path_train = r'C:\Users\XXX'
all_files_train = glob.glob(os.path.join(path_train, "*.csv"))
df = pd.concat((pd.read_csv(f,sep=",") for f in all_files_train),sort=False)
But I can't find a way to modify it in order to meet what I would like to do (knowing that in reality, I have a lot of csv of both types). Could you please help me ?
Expected output :
Col1 Col2 Col3
0 1 2.0 2.0
1 2 4.8 4.8
2 3 1.0 1.0
3 5 9.0 2.9
4 1 7.8 2.2
5 2 2.0 7.0
Upvotes: 0
Views: 41
Reputation: 323326
Try with rename
df = pd.concat((pd.read_csv(f,sep=",").rename(columns = Dict_col_names) for f in all_files_train),sort=False)
If two data frame contain diff columns we can add join ='inner'
df = pd.concat((pd.read_csv(f,sep=",").rename(columns = Dict_col_names) for f in all_files_train),sort=False , join='inner')
Upvotes: 1