xMothas
xMothas

Reputation: 23

How to add header to a Dataframe (created from excel with ".parse") without overwriting the first Row in Excel

I want to read multiple Excel-Files from multiple sheets and files. The biggest issue is just to add an header (because the original excel-Files don't have one) to a generated dataframe of a sheet, because all used methods still overwriting the values in line/row 1 (A1, B1 ...).

Code at the moment:

# before the path was declared 
xl = pd.ExcelFile(file)

# there is additional code above which reads all the files in a folder, selecting the relevant ones and open them
sheetHeader = ["A", "B", "C", "D", "E", "F", "G", "H"]

# rotation through all excel-sheets starts 
for sheetNames in xl.sheet_names:
     df_sheetExtraction = xl.parse(sheet_names = sheetNames, 
                                   header = 0, 
                                   names = sheetHeader)

     #example for asking at the specific cell value
     if df_sheetExtraction["B"][1] == searchedValue:
           pass
     

The problem as an example (in list-form) ...

Excel: [[11, 12, 13, 14 ...],[21, 22, 23, 24 ...], [31, 32, 33, 34 ...] ...]

Dataframe: [[A, B, C, D ...], [21, 22, 23, 24 ...], [31, 32, 33, 34 ...] ...]

instead of [[A, B, C, D ...], [11, 12, 13, 14 ...] ...]

I tried it without the xl.parse(..., names = ...) and add the columns afterwards df_sheetExtraction.columns = sheetHeader with the same result. Different usages in xl.parse(..., header = ...) won't affect the targeted result.

Is there a way to avoid combining the dataframe with a separate "header-dataframe"?

Thx for your help

Upvotes: 2

Views: 1029

Answers (1)

user1717828
user1717828

Reputation: 7223

The header parameter says (emphasis mine):

header : int, list of int, default 0 Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

If you are looking to add your own list as the column names without replacing any of the data in the spreadsheet, you can set header=None and then the names, such as

dfs = pd.read_excel(xlsx_filename,
                       sheet_name=[sheet_name_1,sheet_name_2],
                       header=None,
                       names=['Column_A','Column_B'])

Upvotes: 1

Related Questions