Reputation: 1274
Using the Pandas module and the read_excel function, could I give each column I read in from an excel file a number assignment as a column header, so instead of using g_int_c=str(df1['Unnamed: 1'][6])
to refer to a piece of the data in the excel file, I could use g_int_c=str(df1[1][6])
?
Example code is below:
import pandas as pd
with pd.ExcelFile(inputFile,
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])#assign column headers
df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
Upvotes: 1
Views: 21477
Reputation: 1274
header=None,names=[0,1,2,3,4,5,6]
worked.
with pd.ExcelFile(inputFile,
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])#assign column headers
df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])
df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])
df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=10, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'],header=None,names=[0,1,2,3,4,5,6])
Upvotes: 0
Reputation: 879511
To obtain nice column names instead of defaults like 'Unnamed: 1'
use the names
parameter of pd.read_excel
. Mutatis mutandis, try replacing
with pd.ExcelFile(inputFile,
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])#assign column headers
df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
with
sheets = ['pnl1 Data','pnl2 Data','pnl3 Data','pnl4 Data']
df = pd.read_excel(inputFile, sheetname=sheets, skiprows=9, parse_cols="B:H",
names=list('BCDEFG'))
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}
This will make df
a dict, whose keys are sheet numbers and whose values are
DataFrames. The DataFrames will have colum names B
through G
, roughly like
the original Excel file.
Thus, instead of referring to numbered variables df1
, ..., df4
(generally, a bad idea), you'll have all the DataFrames in the dict df
and will be able to access them by numeric indexing: df[1]
, ..., df[4]
. Sheet pnl3 Data
, for example, would be accessed as df[3]
.
To access the seventh row, B
column value of sheet 'pnl1 Data'
of you could then use:
g_int_c = str(df[1].loc[6, 'B'])
For example,
import pandas as pd
try: from cStringIO import StringIO # for Python2
except ImportError: from io import StringIO # for Python3
import textwrap
df1 = pd.read_csv(StringIO(textwrap.dedent("""
,,,
0,1,2,3
1,4,5,6
7,8,9,10""")))
df2 = pd.read_csv(StringIO(textwrap.dedent("""
,,,
0,NULL,2,3
1,4,NULL,NULL""")), converters={i:str for i in range(4)})
sheets = ['pnl1 Data','pnl2 Data']
writer = pd.ExcelWriter('/tmp/output.xlsx')
for df, sheet in zip([df1, df2], sheets):
print(df)
# Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
# 0 0 NULL 2 3
# 1 1 4 NULL NULL
df.to_excel(writer, sheet)
writer.save()
df = pd.read_excel('/tmp/output.xlsx', sheetname=sheets, names=list('ABCD'), parse_cols="A:E")
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}
for key, dfi in df.items():
print(dfi)
# A B C D
# 0 0 1 2 3
# 1 1 4 5 6
# 2 7 8 9 10
# A B C D
# 0 0 NaN 2.0 3.0
# 1 1 4.0 NaN NaN
print(df[1].loc[1, 'B'])
# 4
Upvotes: 3
Reputation: 381
From the looks of your question, this isn't about assigning number values to columns upon import, but instead about how to access a given cell of a table by column and row numbers, which is a question specifically about how to index or slice a dataframe by integer.
In your example, you mentioned wanting to refer to df1[1][6]
. You can do this by using .iloc
.
# spin up a df
df = pd.DataFrame(np.random.randint(0,10,size=(7, 7)), columns=list('ABCDEFG'))
print df
Output:
A B C D E F G
0 0 7 7 8 8 2 2
1 8 2 9 1 6 8 1
2 5 3 5 5 9 2 7
3 7 4 2 1 1 5 0
4 0 4 4 1 9 7 1
5 4 2 7 7 9 7 2
6 0 6 7 8 1 4 1
Now use .iloc
to index by integer:
df.iloc[1,6]
Output:
1
To return to your code above, you could most likely change it to the following:
g_int_c=str(df.iloc[1,6])
For general references, here's the documentation on indexing and slicing dataframes: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer
And this Q&A might be helpful: How to get column by number in Pandas?
Upvotes: 2