Reputation: 23
I want to read multiple Excel-Files from multiple sheets and files. The biggest issue is just to add an header (because the original excel-Files don't have one) to a generated dataframe of a sheet, because all used methods still overwriting the values in line/row 1 (A1, B1 ...).
Code at the moment:
# before the path was declared
xl = pd.ExcelFile(file)
# there is additional code above which reads all the files in a folder, selecting the relevant ones and open them
sheetHeader = ["A", "B", "C", "D", "E", "F", "G", "H"]
# rotation through all excel-sheets starts
for sheetNames in xl.sheet_names:
df_sheetExtraction = xl.parse(sheet_names = sheetNames,
header = 0,
names = sheetHeader)
#example for asking at the specific cell value
if df_sheetExtraction["B"][1] == searchedValue:
pass
The problem as an example (in list-form) ...
Excel: [[11, 12, 13, 14 ...],[21, 22, 23, 24 ...], [31, 32, 33, 34 ...] ...]
Dataframe: [[A, B, C, D ...], [21, 22, 23, 24 ...], [31, 32, 33, 34 ...] ...]
instead of [[A, B, C, D ...], [11, 12, 13, 14 ...] ...]
I tried it without the xl.parse(..., names = ...)
and add the columns afterwards df_sheetExtraction.columns = sheetHeader
with the same result. Different usages in xl.parse(..., header = ...)
won't affect the targeted result.
Is there a way to avoid combining the dataframe with a separate "header-dataframe"?
Thx for your help
Upvotes: 2
Views: 1029
Reputation: 7223
The header
parameter says (emphasis mine):
header : int, list of int, default 0 Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a
MultiIndex
. Use None if there is no header.
If you are looking to add your own list as the column names without replacing any of the data in the spreadsheet, you can set header=None
and then the names, such as
dfs = pd.read_excel(xlsx_filename,
sheet_name=[sheet_name_1,sheet_name_2],
header=None,
names=['Column_A','Column_B'])
Upvotes: 1