Reputation: 439
I am using pandas to read in an Excel file that has the following structure:
A B C D E
╔════════════════════╦════════════╦═══════╦═══════╦══════════╗
1 ║ Project Name ║ ║ ║ ║ ║
╠════════════════════╬════════════╬═══════╬═══════╬══════════╣
2 ║ Date of generation ║ ║ ║ ║ ║
╠════════════════════╬════════════╬═══════╬═══════╬══════════╣
3 ║ ID ║ Type ║ Value ║ Color ║ Capacity ║
╠════════════════════╬════════════╬═══════╬═══════╬══════════╣
4 ║ 1 ║ Car ║ 1,000 ║ Red ║ 2 ║
╠════════════════════╬════════════╬═══════╬═══════╬══════════╣
5 ║ 2 ║ Truck ║ 800 ║ Green ║ 12 ║
╠════════════════════╬════════════╬═══════╬═══════╬══════════╣
6 ║ 3 ║ Helicopter ║ 5,000 ║ White ║ 4 ║
╚════════════════════╩════════════╩═══════╩═══════╩══════════╝
As you can see, the data headings are on row 3, and the information continues underneath. If I read the frame in like so:
df = pandas.read_excel("sample.xls", sheetname=0, skiprows=2, header=0)
my dataframe has the correct portion of data, but I'm unable to read the information in the first cells, A1 and A2.
I've tried the following as well, to 'slice' a new dataframe once I have the top two values taken. I get an error:
df = pandas.read_excel("sample.xls", sheetname=0, skiprows=0, header=None)
project_name = df.iloc[0][0]
project_date = df.iloc[1][0]
new_header = df.iloc[2] #grab the first row for the header
df = df[3:] #take the data less the header row
df.rename(columns = new_header) #set the header row as the df header
#tidyup
df = df.fillna('') #convert NaN (blank) cells to empty string
df.applymap(str) #convert all columns to str
df.columns = df.columns.str.strip() #strip whitespace before and after headers
df_obj = df.select_dtypes(['object']) #select all object columns (not int/float columns)
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip()) #strip (trim whitepace) of all object columns
Running the above I get an error:
AttributeError: Can only use .str accessor with string values (i.e. inferred_type is 'string', 'unicode' or 'mixed')
Its not clear to me why as the same 'tidyups section but reading the df as in my first code line above work.
Is there a way I can get to the 'above header' information without creating a whole new dataframe by reading the file twice?
Upvotes: 2
Views: 187
Reputation: 862396
There is problem you forget to assign back:
df = df.applymap(str)
Also better (faster) is instead applymap
use astype
:
df = df.astype(str)
All together:
project_name = df.iloc[0, 0]
project_date = df.iloc[1, 0]
new_header = df.iloc[2]
df = df[3:]
df.columns = new_header.str.strip()
df.columns.name = None
df = df.astype(str).reset_index(drop=True)
df = df.apply(lambda x: x.str.strip())
print (df)
ID Type Value Color Capacity
0 1 Car 1,000 Red 2
1 2 Truck 800 Green 12
2 3 Helicopter 5,000 White 4
Upvotes: 3