Reputation: 121
I have an excel file containing multiple worksheets. Each worksheet contains price & inventory data for individual item codes for a particular month.
for example...
sheetname = 201509
code price inventory
5001 5 92
5002 7 50
5003 6 65
sheetname = 201508
code price inventory
5001 8 60
5002 10 51
5003 6 61
Using pandas dataframe, how is the best way to import this data, organized by time and item code. I need this dataframe to eventually be able to graph changes in price&inventory for item code 5001 for example.
I would appreciate your help. I am still new to python/pandas. Thanks.
My solution... Here is a solution I found to my problem.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
D201509 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201509', index_col='Code')
D201508 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201508', index_col='Code')
D201507 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201507', index_col='Code')
D201506 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201506', index_col='Code')
D201505 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201505', index_col='Code')
total = pd.concat(dict(D201509=D201509, D201508=D201508, D201507=D201507, D201506=D201506, D201505=D201505), axis=1)
total.head()
which will nicely produce this dataframe with hierarchical columns..
Now my new question is how would you plot the change in prices for every code # with this dataframe? I want to see 5 lines (5001,5002,5003,5004,5005), with the x axis being the time (D201505, D201506, etc) and the y axis being the price value.
Thanks.
Upvotes: 1
Views: 4588
Reputation: 3009
This will get your data into a data frame and do a scatter plot on 5001
import pandas as pd
import matplotlib.pyplot as plt
import xlrd
file = r'C:\dickster\data.xlsx'
list_dfs = []
xls = xlrd.open_workbook(file, on_demand=True)
for sheet_name in xls.sheet_names():
df = pd.read_excel(file,sheet_name)
df['time'] = sheet_name
list_dfs.append(df)
dfs = pd.concat(list_dfs,axis=0)
dfs = dfs.sort(['time','code'])
which looks like:
code price inventory time
0 5001 8 60 201508
1 5002 10 51 201508
2 5003 6 61 201508
0 5001 5 92 201509
1 5002 7 50 201509
2 5003 6 65 201509
And now the plot of 5001: price v inventory:
dfs[dfs['code']==5001].plot(x='price',y='inventory',kind='scatter')
plt.show()
which produces:
Upvotes: 2