Kvothe
Kvothe

Reputation: 1393

Pandas reading csv files with partial wildcard

I'm trying to write a script that imports a file, then does something with the file and outputs the result into another file.

df = pd.read_csv('somefile2018.csv')

The above code works perfectly fine. However, I'd like to avoid hardcoding the file name in the code.

The script will be run in a folder (directory) that contains the script.py and several csv files.

I've tried the following:

somefile_path = glob.glob('somefile*.csv')

df = pd.read_csv(somefile_path)

But I get the following error:

ValueError: Invalid file path or buffer object type: <class 'list'>

Upvotes: 13

Views: 23846

Answers (6)

Hrushi
Hrushi

Reputation: 325

We will start useing concat from now on as append will be remove in feature release.

import pandas as pd
from glob import glob
def read_pattern(patt):
    files = glob(patt)
    # Create empty dataframe
    df = pd.DataFrame()
    for f in files:
        # Concat Instead of append
        df = pd.concat([df,pd.read_csv(f, low_memory=False)])
    return df.reset_index(drop=True)
df = read_pattern('*.csv')

Given the particular path

Upvotes: 0

Nick_Jo
Nick_Jo

Reputation: 111

I'm adding this as the other bits didn't quite work for me, a new user. The below code works and is easy to copy and paste.

csv_file_path = glob.glob('./*.csv')
list_into_strings = ''.join(csv_file_path)
df_in = pd.read_csv(list_into_strings)

I've tested this many times for single files. Not sure about multiple files.

Upvotes: 1

pleicht17
pleicht17

Reputation: 279

To read all of the files that follow a certain pattern, so long as they share the same schema, use this function:

import glob
import pandas as pd

def pd_read_pattern(pattern):
    files = glob.glob(pattern)

    df = pd.DataFrame()
    for f in files:
        df = df.append(pd.read_csv(f))

    return df.reset_index(drop=True)

df = pd_read_pattern('somefile*.csv')

This will work with either an absolute or relative path.

Upvotes: 7

iDrwish
iDrwish

Reputation: 3103

You can get the list of the CSV files in the script and loop over them.

from os import listdir
from os.path import isfile, join
mypath = os.getcwd()

csvfiles = [f for f in listdir(mypath) if isfile(join(mypath, f)) if '.csv' in f]

for f in csvfiles:
    pd.read_csv(f)
# the rest of your script

Upvotes: 2

James
James

Reputation: 36623

glob returns a list, not a string. The read_csv function takes a string as the input to find the file. Try this:

for f in glob('somefile*.csv'):
    df = pd.read_csv(f)
    ...
    # the rest of your script

Upvotes: 21

Zeugma
Zeugma

Reputation: 32095

Loop over each file and build a list of DataFrame, then assemble them together using concat.

Upvotes: 1

Related Questions