jules325
jules325

Reputation: 179

Concatenate 2 Rows to be header/column names

I have an excel sheet that is really poorly formatted. The actual column names I would like to use are across two rows; For example, if the correct column name should be Labor Percent, cell A1 would contain Labor, and cell A2 would contain Percent).

I try to load the file, here's what I'm doing:

import os
os.getcwd()
os.chdir(r'xxx')

import pandas as pd

file = 'problem.xls'
xl = pd.ExcelFile(file)
print(xl.sheet_names)

df = xl.parse('WEEKLY NUMBERS', skiprows=35)

enter image description here

As you can see in the picture, the remainder of what should be the column name is in the second row. Is there a way to rename the columns by concatenating? Can this somehow be done with the header= argument in the xl.parse bit?

Upvotes: 0

Views: 2324

Answers (2)

sjw
sjw

Reputation: 6543

Here's something you can try. Essentially it reads in the first two rows as your header, but treats it as a hierarchical multi-index. The second line of code below then flattens that multi-index down to a single string. I'm not 100% certain it will work for your data but is worth a try - it worked for the small dummy test data I tried it with:

df = pd.read_excel('problem.xlsx', sheetname='WEEKLY NUMBERS', header=[0, 1])
df.columns = df.columns.map(' '.join)

The second line was taken from this answer about flattening a multi-index.

Upvotes: 0

max
max

Reputation: 4521

You can rename the columns yourself by setting:

df.columns = ['name1', 'name2', 'name3' ...]

Note that you must specify a name for every column.

Then drop the first row to get rid of the unwanted row of column names.

df = df.drop(0)

Upvotes: 2

Related Questions