Pandas read csv using column names included in a list

Question

I'm quite new to Pandas. I'm trying to create a dataframe reading thousands of csv files.
The files are not structured in the same way, but I want to extract only columns I'm interested in, so I created a list which inlcudes all the column names I want, but then i have an error cause not all of them are included in each dataset.

import pandas as pd
import numpy as np
import os
import glob

# select the csv folder
csv_folder= r'myPath'

# select all xlsx files within the folder
all_files = glob.glob(csv_folder + "/*.csv")

# Set the column names to include in the dataframe
columns_to_use = ['Name1', 'Name2', 'Name3', 'Name4', 'Name5', 'Name6']

# read one by one all the excel
for filename in all_files:
    df = pd.read_csv(filename,
                     header=0,
                     usecols = columns_to_use)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
      1 for filename in all_files:
----> 2     df = pd.read_csv(filename,
      3                      header=0,
      4                     usecols = columns_to_use)
      5 

ValueError: Usecols do not match columns, columns expected but not found: ['Name1', 'Name2', 'Name4']

How could I handle this issue by including a columns if this is present in the list?

Stef · Accepted Answer

Usa a callable for usecols, i.e. df = pd.read_csv(filename, header=0, usecols=lambda c: c in columns_to_use). From the docs of the usecols parameter:

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True.

Working example that will only read col1 and not throw an error on missing col3:

import pandas as pd
import io

s = """col1,col2
1,2"""

df = pd.read_csv(io.StringIO(s), usecols=lambda c: c in ['col1', 'col3'])

Pandas read csv using column names included in a list

Answers (1)

Related Questions