Sarah
Sarah

Reputation: 21

Moving rows of data within pandas dataframe to end of last column

Python newbie, please be gentle. I have data in two "middle sections" of a multiple Excel spreadsheets that I would like to isolate into one pandas dataframe. Below is a link to a data screenshot. Within each file, my headers are in Row 4 with data in Rows 5-15, Columns B:O. The headers and data then continue with headers on Row 21, data in Rows 22-30, Columns B:L. I would like to move the headers and data from the second set and append them to the end of the first set of data.

This code captures the header from Row 4 and data in Columns B:O but captures all Rows under the header including the second Header and second set of data. How do I move this second set of data and append it after the first set of data?

path =r'C:\Users\sarah\Desktop\Original' 
allFiles = glob.glob(path + "/*.xls")

frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
  df = pd.read_excel(file_,sheetname="Data1", parse_cols="B:O",index_col=None, header=3, skip_rows=3 )

list_.append(df)
frame = pd.concat(list_)

Screenshot of my data

enter image description here

Upvotes: 2

Views: 1197

Answers (2)

Kartik
Kartik

Reputation: 8683

If all of your Excel files have the same number of rows and this is a one time operation, you could simply hard code those numbers in your read_excel. If not, it will be a little tricky, but you pretty much follow the same procedure:

for file_ in allFiles:
    top = pd.read_excel(file_, sheetname="Data1", parse_cols="B:O", index_col=None,
                        header=4, skip_rows=3, nrows=14) # Note the nrows kwag
    bot = pd.read_excel(file_, sheetname="Data1", parse_cols="B:L", index_col=None,
                        header=21, skip_rows=20, nrows=14)
    list_.append(top.join(bot, lsuffix='_t', rsuffix='_b'))

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

you can do it this way:

df1 = pd.read_excel(file_,sheetname="Data1", parse_cols="B:O",index_col=None, header=3, skip_rows=3)
df2 = pd.read_excel(file_,sheetname="Data1", parse_cols="B:L",index_col=None, header=20, skip_rows=20)
# pay attention at `axis=1`
df = pd.concat([df1,df2], axis=1)

Upvotes: 0

Related Questions