Reputation: 43
I am reading CSV file:
Notation Level RFResult PRIResult PDResult Total Result
AAA 1 1.23 0 2 3.23
AAA 1 3.4 1 0 4.4
BBB 2 0.26 1 1.42 2.68
BBB 2 0.73 1 1.3 3.03
CCC 3 0.30 0 2.73 3.03
DDD 4 0.25 1 1.50 2.75
AAA 5 0.25 1 1.50 2.75
FFF 6 0.26 1 1.42 2.68
...
...
Here is the code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.rad_csv('home\NewFiles\Files.csv')
Notation = df['Notation']
Level = df['Level']
RFResult = df['RFResult']
PRIResult = df['PRIResult']
PDResult = df['PDResult']
fig, axes = plt.subplots(nrows=7, ncols=1)
ax1, ax2, ax3, ax4, ax5, ax6, ax7 = axes.flatten()
n_bins = 13
ax1.hist(data['Total'], n_bins, histtype='bar') #Current this shows all Total Results in one plot
plt.show()
I want to show each Level Total Result in each different axes like as follow:
ax1 will show Level 1 Total Result
ax2 will show Level 2 Total Result
ax3 will show Level 3 Total Result
ax4 will show Level 4 Total Result
ax5 will show Level 5 Total Result
ax6 will show Level 6 Total Result
ax7 will show Level 7 Total Result
Upvotes: 0
Views: 1581
Reputation: 80554
You can select a filtered part of a dataframe just by indexing: df[df['Level'] == level]['Total']
. You can loop through the axes using for ax in axes.flatten()
. To also get the index, use for ind, ax in enumerate(axes.flatten())
. Note that Python normally starts counting from 1, so adding 1 to the index would be a good choice to indicate the level.
Note that when you have backslashes in a string, you can escape them using an r-string: r'home\NewFiles\Files.csv'
.
The default ylim is from 0 to the maximum bar height, plus some padding. This can be changed for each ax
separately. In the example below a list of ymax values is used to show the principle.
ax.grid(True, axis='both)
sets the grid on for that ax
. Instead of 'both', also 'x' or 'y' can be used to only set the grid for that axis. A grid line is drawn for each tick value. (The example below tries to use little space, so only a few gridlines are visible.)
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
N = 1000
df = pd.DataFrame({'Level': np.random.randint(1, 6, N), 'Total': np.random.uniform(1, 5, N)})
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True)
ymax_per_level = [27, 29, 28, 26, 27]
for ind, (ax, lev_ymax) in enumerate(zip(axes.flatten(), ymax_per_level)):
level = ind + 1
n_bins = 13
ax.hist(df[df['Level'] == level]['Total'], bins=n_bins, histtype='bar')
ax.set_ylabel(f'TL={level}') # to add the level in the ylabel
ax.set_ylim(0, lev_ymax)
ax.grid(True, axis='both')
plt.show()
PS: A stacked histogram with custom legend and custom vertical lines could be created as:
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import pandas as pd
import numpy as np
N = 1000
df = pd.DataFrame({'Level': np.random.randint(1, 6, N),
'RFResult': np.random.uniform(1, 5, N),
'PRIResult': np.random.uniform(1, 5, N),
'PDResult': np.random.uniform(1, 5, N)})
df['Total'] = df['RFResult'] + df['PRIResult'] + df['PDResult']
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True)
colors = ['crimson', 'limegreen', 'dodgerblue']
column_names = ['RFResult', 'PRIResult', 'PDResult']
level_vertical_line = [1, 2, 3, 4, 5]
for level, (ax, vertical_line) in enumerate(zip(axes.flatten(), level_vertical_line), start=1):
n_bins = 13
level_data = df[df['Level'] == level][column_names].to_numpy()
# vertical_line = level_data.mean()
ax.hist(level_data, bins=n_bins,
histtype='bar', stacked=True, color=colors)
ax.axvline(vertical_line, color='gold', ls=':', lw=2)
ax.set_ylabel(f'TL={level}') # to add the level in the ylabel
ax.margins(x=0.01)
ax.grid(True, axis='both')
legend_handles = [Patch(color=color) for color in colors]
axes[0].legend(legend_handles, column_names, ncol=len(column_names), loc='lower center', bbox_to_anchor=(0.5, 1.02))
plt.show()
Upvotes: 1