Matteo
Matteo

Reputation: 165

append multiple sheets from excel into pandas dataframe - sort problems

I am trying to append all worksheets in multiple excel into pandas dataframe. All the sheets have the same columns. I need to skip top 9 rows ad remove footer after 50. Row 10 will become the header.

I wrote the following code:

source_dataset = r"C:\location\xxx"

out_df = pd.DataFrame()

for f in source_dataset:
    cdf = [pd.read_excel ('excel file name.xlxs',
                     sheet_name = ['sheet_1',
                                   'sheet_2',
                                   'sheet_3',
                                   'sheet_4',
                                   ],
                      skiprows = 9,
                      skipfooter = 50).values()

for excel_names in source_dataset]

out_df = pd.concat([pd.concat(x) for x in cdf], ignore_index=True, sort=True)

I get the following warning:

c:\users\xxx\appdata\local\programs\python\python37\lib\site-packages\ipykernel_launcher.py:35: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

The file I get is not OK as one column is repeated twice.

1) What I am doing wrong?

2) Where I should insert sort=False/True? I tried here but is not working:

out_df = pd.concat([pd.concat(x) for x in cdf], ignore_index=True, sort=True)

Thanks

Upvotes: 1

Views: 255

Answers (1)

jezrael
jezrael

Reputation: 862671

You can pass sort=True to both concat:

out_df = pd.concat([pd.concat(x, sort=True) for x in cdf], ignore_index=True, sort=True)

Upvotes: 1

Related Questions