Nick Mueller
Nick Mueller

Reputation: 121

Pandas combine multiple excel worksheets on specific index

I have an excel file containing multiple worksheets. Each worksheet contains price & inventory data for individual item codes for a particular month.

for example...

sheetname = 201509

code price inventory 
5001  5       92
5002  7       50
5003  6       65

sheetname = 201508

code price inventory
5001  8       60
5002  10      51
5003  6       61

Using pandas dataframe, how is the best way to import this data, organized by time and item code. I need this dataframe to eventually be able to graph changes in price&inventory for item code 5001 for example.

I would appreciate your help. I am still new to python/pandas. Thanks.


My solution... Here is a solution I found to my problem.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

D201509 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201509', index_col='Code')
D201508 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201508', index_col='Code')
D201507 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201507', index_col='Code')
D201506 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201506', index_col='Code')
D201505 = pd.read_excel('ExampleSpreadsheet.xlsx', sheetname='201505', index_col='Code')

total = pd.concat(dict(D201509=D201509, D201508=D201508, D201507=D201507, D201506=D201506, D201505=D201505), axis=1)

total.head()

which will nicely produce this dataframe with hierarchical columns..

enter image description here

Now my new question is how would you plot the change in prices for every code # with this dataframe? I want to see 5 lines (5001,5002,5003,5004,5005), with the x axis being the time (D201505, D201506, etc) and the y axis being the price value.

Thanks.

Upvotes: 1

Views: 4588

Answers (1)

Dickster
Dickster

Reputation: 3009

This will get your data into a data frame and do a scatter plot on 5001

import pandas as pd
import matplotlib.pyplot as plt
import xlrd

file = r'C:\dickster\data.xlsx'
list_dfs = []

xls = xlrd.open_workbook(file, on_demand=True)
for sheet_name in xls.sheet_names():
    df = pd.read_excel(file,sheet_name)
    df['time'] = sheet_name
    list_dfs.append(df)

dfs = pd.concat(list_dfs,axis=0)
dfs = dfs.sort(['time','code'])

which looks like:

   code  price  inventory    time
0  5001      8         60  201508
1  5002     10         51  201508
2  5003      6         61  201508
0  5001      5         92  201509
1  5002      7         50  201509
2  5003      6         65  201509

And now the plot of 5001: price v inventory:

dfs[dfs['code']==5001].plot(x='price',y='inventory',kind='scatter')
plt.show()

which produces:

enter image description here

Upvotes: 2

Related Questions