Reputation: 91
I have multiple csv files with similar names in numeric order (nba_1, nba_2, etc). They are all formatted the same as far as column names and dtypes. Instead of manually pulling each one in individually to a dataframe (nba_1 = pd.read_csv('/nba_1.csv'))
is there a way to write a for
loop or something like it to pull them in and name them? I think the basic framework would be something like:
for i in range(1, 6):
nba_i = pd.read_csv('../nba_i.csv')
Beyond that, I do not know the particulars. Once I pull them in I will be performing the same actions on each of them (deleting and formating the same columns) so I would also want to iterate through them there.
Thank you in advance for your help.
Upvotes: 0
Views: 84
Reputation: 62383
csv
files are the same, as you stated in the question, it would be more efficient to combine them all into a single dataframe and then clean the data all at once.
from pathlib import Path
import pandas as pd
p = Path(r'c:\some_path_to_files') # set your path
files = p.glob('nba*.csv') # find your files
# It was stated, all the files are the same format, so create one dataframe
df = pd.concat([pd.read_csv(file) for file in files])
[pd.read_csv(file) for file in files]
is a list comprehension, which creates a dataframe of each file.pd.concat
combines all the files in the listdict
of dataframeskey
of the dict
will be a filenamep = Path(r'c:\some_path_to_files') # set your path
files = p.glob('nba*.csv') # find your files
df_dict = dict()
for file in files:
df_dict[file.stem] = pd.read_csv(file)
df_dict
:df_dict.keys() # to show you all the keys
df_dict[filename] # to access a specific dataframe
# after cleaning the individual dataframes in df_dict, they can be combined
df_final = pd.concat([value for value in df_dict.values()])
Upvotes: 1
Reputation: 1928
The Dask library built over Pandas has methods to load multiple csv to single dataframe, at once.
Upvotes: 0