Reputation: 1393
I'm trying to write a script that imports a file, then does something with the file and outputs the result into another file.
df = pd.read_csv('somefile2018.csv')
The above code works perfectly fine. However, I'd like to avoid hardcoding the file name in the code.
The script will be run in a folder (directory) that contains the script.py
and several csv files.
I've tried the following:
somefile_path = glob.glob('somefile*.csv')
df = pd.read_csv(somefile_path)
But I get the following error:
ValueError: Invalid file path or buffer object type: <class 'list'>
Upvotes: 13
Views: 23846
Reputation: 325
We will start useing concat from now on as append will be remove in feature release.
import pandas as pd
from glob import glob
def read_pattern(patt):
files = glob(patt)
# Create empty dataframe
df = pd.DataFrame()
for f in files:
# Concat Instead of append
df = pd.concat([df,pd.read_csv(f, low_memory=False)])
return df.reset_index(drop=True)
df = read_pattern('*.csv')
Given the particular path
Upvotes: 0
Reputation: 111
I'm adding this as the other bits didn't quite work for me, a new user. The below code works and is easy to copy and paste.
csv_file_path = glob.glob('./*.csv')
list_into_strings = ''.join(csv_file_path)
df_in = pd.read_csv(list_into_strings)
I've tested this many times for single files. Not sure about multiple files.
Upvotes: 1
Reputation: 279
To read all of the files that follow a certain pattern, so long as they share the same schema, use this function:
import glob
import pandas as pd
def pd_read_pattern(pattern):
files = glob.glob(pattern)
df = pd.DataFrame()
for f in files:
df = df.append(pd.read_csv(f))
return df.reset_index(drop=True)
df = pd_read_pattern('somefile*.csv')
This will work with either an absolute or relative path.
Upvotes: 7
Reputation: 3103
You can get the list of the CSV files in the script and loop over them.
from os import listdir
from os.path import isfile, join
mypath = os.getcwd()
csvfiles = [f for f in listdir(mypath) if isfile(join(mypath, f)) if '.csv' in f]
for f in csvfiles:
pd.read_csv(f)
# the rest of your script
Upvotes: 2
Reputation: 36623
glob
returns a list, not a string. The read_csv
function takes a string as the input to find the file. Try this:
for f in glob('somefile*.csv'):
df = pd.read_csv(f)
...
# the rest of your script
Upvotes: 21
Reputation: 32095
Loop over each file and build a list of DataFrame, then assemble them together using concat
.
Upvotes: 1