Reputation: 101
with the help of some wonderful people around here, I was able to generate my first box plots in seaborn. I have 2 separate seaborn plots that show two comparisons from an excel sheet. What I want to do now is present both the data comparisons (what is shown in the 2 columns below) on the same plot, essentially creating a grouped boxplot. I tried to convert the data to dataframes, concat, and melt it, but was unsuccessful. I am pretty new to python, so I was wondering if you all could help me out. Below is what I have for code.
import pandas as pd
import numpy as np
import xlrd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from pandas import ExcelWriter
from pandas import ExcelFile
from pandas import DataFrame
excel_file = 'Project File Merger.xlsm'
list_dfs = []
xls = xlrd.open_workbook(excel_file,on_demand=True)
sheet_names = xls.sheet_names()
d_data = {}
for i, sheet_name in enumerate(xls.sheet_names()):
df = pd.read_excel(excel_file,sheet_name)
d_data[sheet_names[i]] = df.loc[:,['HMB','PSPPM']]
keys = list(d_data.keys())
values_list1 = list(d_data.values())
print(keys[0])
print(values_list1[0])
Which returns
Check1.xlsm
HMB PSPPM
0 0.141005 0.429498
1 0.141005 0.429498
2 0.066071 0.706797
3 NaN 0.080378
4 0.045815 0.004076
5 NaN 0.630156
6 NaN 0.723957
7 NaN 0.712118
8 0.391531 0.791329
9 0.036823 0.506834
10 0.391531 0.791329
Now this is where I am stuck. I have a values_list that has 17 element (one for each sheet in the excel file). I would like the data from each sheet to be grouped together. I think I might be running into a problem because there are 2 columns in each list element? Any suggestions would be appreciated!
Upvotes: 0
Views: 623
Reputation: 40667
I'm not entirely sure to understand your problem fully, in particular in relation to boxplots. But, as far as I understand, you have a dictionary with the name of your excel sheets as the keys, and a DataFrame as the value. And you want to merge all these DataFrame into a single one so you can plot all the values together?
If that's correct, then a simple pd.concat
can accept a dictionary and generate a new DataFrame with the keys as indexes. You can then use reset_index()
to flatten out the DataFrame:
new_df = pd.concat(d_data).reset_index()
After that, I don't know how you want to draw your boxplot, but you could for example draw the values of one of your column in each of the sheets:
sns.boxplot(x='level_0', y='HMB', data=new_df)
Upvotes: 1