How to import multiple csv in one dataframe with different column names and unimportant columns?

Question

Let's take sample data. Imagine I have two csv documents in one directory. When I import them separately, I get these two dataframes :

# first one
   Col1  Col2  Col3
0     1   2.0   2.0
1     2   4.8   4.8
2     3   1.0   1.0

# second one
   ColA  ColB  ColC  ColD
0     5   9.0   2.9   2.2
1     1   7.8   2.2   9.8
2     2   2.0   7.0   1.6

I would like to import them in a unique dataframe, knowing that I have the following dictionary :

Dict_col_names = {
    "Col1" : "ColA",
    "Col2" : "ColB",
    "Col3" : "ColC"
}

So I can have through this dictionary the correspondence between column names. If a column name is not in the dictionary, I don't want to import it. I know that the basic code to import csv of an entire directory into a single dataframe is the following :

import glob
import os
path_train = r'C:\Users\XXX'
all_files_train = glob.glob(os.path.join(path_train, "*.csv"))
df = pd.concat((pd.read_csv(f,sep=",") for f in all_files_train),sort=False)

But I can't find a way to modify it in order to meet what I would like to do (knowing that in reality, I have a lot of csv of both types). Could you please help me ?

Expected output :

   Col1  Col2  Col3
0     1   2.0   2.0
1     2   4.8   4.8
2     3   1.0   1.0
3     5   9.0   2.9
4     1   7.8   2.2
5     2   2.0   7.0

BENY · Accepted Answer

Try with rename

df = pd.concat((pd.read_csv(f,sep=",").rename(columns = Dict_col_names) for f in all_files_train),sort=False)

If two data frame contain diff columns we can add join ='inner'

df = pd.concat((pd.read_csv(f,sep=",").rename(columns = Dict_col_names) for f in all_files_train),sort=False , join='inner')

How to import multiple csv in one dataframe with different column names and unimportant columns?

Answers (1)

Related Questions