Reputation: 87
I have the same file with quarterly data saved in different folders corresponding to the quarter. In other words, a quarter 1 folder, quarter 2 , quarter 3, quarter 4. This is the only difference in the file path. I am looking to read all four files in and concatenate them into one database. I can do this manually using a version of the simplified code below and changing the period each time.
period = ‘Q1’
filepath = ‘filepath/’ + period
file = filepath + ‘/file.xls’
df = pd.read_excel(file)
I would like to automate it with some form of for loop (I assume). That loops through the 4 periods, reads the file into a database and then concatenates. I have read other answers as to how this can be done with files in the same folder. But am struggling to do it where the file path changes. Manually putting the files into the same folder is not a desirable solution.
I tried making period a tuple and a list containing all 4 periods then a simple for loop but this didn’t work. I got the following error message.
TypeError: Can't convert 'list' object to str implicitly
Greatly appreciate any advice.
Upvotes: 0
Views: 739
Reputation: 1
You could try something like this:
complete_df = pd.DataFrame()
for i in range(1,5):
quarter = 'Q'+str(i)
filepath = 'filepath/' + quarter
file = filepath + '/file.xls'
df = pd.read_excel(file)
complete_df = complete_df.append(df)
Upvotes: 0
Reputation: 513
You can use these loops to create full file paths and to iterate over them to create one DataFrame containing all the files.
filepath = 'path/'
file = 'file.xlsx'
periods=['Q1','Q2','Q3','Q4']
files = []
for p in periods:
files.append(filepath+p+'/'+file)
files
data = []
for f in files:
data.append(pd.read_excel(f))
df = pd.concat(data)
Upvotes: 1
Reputation: 407
How about you first use list comprehension to get a list of all files:
periods= ["Q1", "Q2", "Q3", "Q4"]
files = ["filepath/"+ p + "/file.xls" for p in periods]
and then load them all into a list of data frames with
dfs = []
for f in files:
df = pd.read_excel(f)
dfs.append(df)
Upvotes: 1
Reputation: 4653
You probably want something like this:
periods = ['Q1', 'Q2', 'Q3', 'Q4']
df = None
for period in periods:
filepath = 'filepath/' + period
file = filepath + '/file.xls'
if df is None:
df = pd.read_excel(file)
else:
df.append(pd.read_excel(file))
Upvotes: 0