Reputation: 21
Python newbie, please be gentle. I have data in two "middle sections" of a multiple Excel spreadsheets that I would like to isolate into one pandas dataframe. Below is a link to a data screenshot. Within each file, my headers are in Row 4 with data in Rows 5-15, Columns B:O. The headers and data then continue with headers on Row 21, data in Rows 22-30, Columns B:L. I would like to move the headers and data from the second set and append them to the end of the first set of data.
This code captures the header from Row 4 and data in Columns B:O but captures all Rows under the header including the second Header and second set of data. How do I move this second set of data and append it after the first set of data?
path =r'C:\Users\sarah\Desktop\Original'
allFiles = glob.glob(path + "/*.xls")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
df = pd.read_excel(file_,sheetname="Data1", parse_cols="B:O",index_col=None, header=3, skip_rows=3 )
list_.append(df)
frame = pd.concat(list_)
Upvotes: 2
Views: 1197
Reputation: 8683
If all of your Excel files have the same number of rows and this is a one time operation, you could simply hard code those numbers in your read_excel
. If not, it will be a little tricky, but you pretty much follow the same procedure:
for file_ in allFiles:
top = pd.read_excel(file_, sheetname="Data1", parse_cols="B:O", index_col=None,
header=4, skip_rows=3, nrows=14) # Note the nrows kwag
bot = pd.read_excel(file_, sheetname="Data1", parse_cols="B:L", index_col=None,
header=21, skip_rows=20, nrows=14)
list_.append(top.join(bot, lsuffix='_t', rsuffix='_b'))
Upvotes: 1
Reputation: 210832
you can do it this way:
df1 = pd.read_excel(file_,sheetname="Data1", parse_cols="B:O",index_col=None, header=3, skip_rows=3)
df2 = pd.read_excel(file_,sheetname="Data1", parse_cols="B:L",index_col=None, header=20, skip_rows=20)
# pay attention at `axis=1`
df = pd.concat([df1,df2], axis=1)
Upvotes: 0