Reputation: 165
I am trying to append all worksheets in multiple excel into pandas dataframe. All the sheets have the same columns. I need to skip top 9 rows ad remove footer after 50. Row 10 will become the header.
I wrote the following code:
source_dataset = r"C:\location\xxx"
out_df = pd.DataFrame()
for f in source_dataset:
cdf = [pd.read_excel ('excel file name.xlxs',
sheet_name = ['sheet_1',
'sheet_2',
'sheet_3',
'sheet_4',
],
skiprows = 9,
skipfooter = 50).values()
for excel_names in source_dataset]
out_df = pd.concat([pd.concat(x) for x in cdf], ignore_index=True, sort=True)
I get the following warning:
c:\users\xxx\appdata\local\programs\python\python37\lib\site-packages\ipykernel_launcher.py:35: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
The file I get is not OK as one column is repeated twice.
1) What I am doing wrong?
2) Where I should insert sort=False/True? I tried here but is not working:
out_df = pd.concat([pd.concat(x) for x in cdf], ignore_index=True, sort=True)
Thanks
Upvotes: 1
Views: 255
Reputation: 862671
You can pass sort=True
to both concat
:
out_df = pd.concat([pd.concat(x, sort=True) for x in cdf], ignore_index=True, sort=True)
Upvotes: 1