Reputation: 433
I have a pandas dataframe like the following:
df = pd.DataFrame({ 'a_wood' : np.random.randn(100),
'a_grassland' : np.random.randn(100),
'a_settlement' : np.random.randn(100),
'b_wood' : np.random.randn(100),
'b_grassland' : np.random.randn(100),
'b_settlement' : np.random.randn(100)})
and I want to create histograms of this data with every dataframe header in one subplot.
fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')
m=0
for i in range(2):
for j in range(3):
df.hist(column = df.columns[m], bins = 12, ax=ax[i,j], figsize=(20, 18))
m+=1
For that the previous code works perfectly but now I want to combine eyery a and b header (e.g. "a_woods" and "b-woods") to one subplot so there would be just three histograms. I tried assigning two columns to df.columns[[m,m+3]]
but this doesn't work. I also have an index column with strings like "day_1", which I want to be on the x-axis. Can someone help me?
Upvotes: 14
Views: 41144
Reputation: 961
I don't know if I understood your question correctly, but something like this can combine the plots. You might want to play around a little with the alpha and change the headers.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({'a_wood' : np.random.randn(100),
'a_grassland' : np.random.randn(100),
'a_settlement' : np.random.randn(100),
'b_wood' : np.random.randn(100),
'b_grassland' : np.random.randn(100),
'b_settlement' : np.random.randn(100)})
fig, ax = plt.subplots(1, 3, sharex='col', sharey='row', figsize=(20, 18))
n = 3
n_bins = 12
for i in range(n):
min_value = df.iloc[:,[i,i+n]].min().min() #Get minimum value of column pairs, e.g. column 0 (a_wood) and column 3 (b_wood)
max_value = df.iloc[:,[i,i+n]].max().max() #Get maximum value of column pairs
bins = np.linspace(min_value, max_value, n_bins) #Create bins of equal size between min_value and max_value
df.hist(column=df.columns[i], bins=bins, ax=ax[i], alpha=0.5, color='red')
df.hist(column=df.columns[i+n], bins=bins, ax=ax[i], alpha=0.5, color='blue')
ax[i].set_title(df.columns[i][2:])
To plot them both next to eachother, try this:
#We do not have to specify the bins in this example
fig, ax = plt.subplots(1, 3, sharex='col', sharey='row', figsize=(20, 18))
n = 3
colors = ['red', 'blue']
axes = ax.flatten()
for i,j in zip(range(n), axes):
j.hist([df.iloc[:,i], df.iloc[:,i+n]], bins=12, color=colors)
j.set_title(df.columns[i][2:])
Upvotes: 12
Reputation: 81
you want something that loops through each column and plot its data in histogram, right? I can suggest you to make few modifications that you can re-use in future code, before giving the code there are few useful tips that are helpful,
ax.ravel()
which enable this.enumerate()
is always useful to loop through an object while making available the ith element and its index at the same time.here is my code proposal :
fig, ax = plt.subplots(1, 3, sharex='col', sharey='row', figsize=(12,7))
ax = ax.ravel()
# this method helps you to go from a 2x3 array coordinates to
# 1x6 array, it will be helpful to use as below
for idx in range(3):
ax[idx].hist(df.iloc[:,idx], bins=12, alpha=0.5)
ax[idx].hist(df.iloc[:,idx+3], bins=12, alpha=0.5)
ax[idx].set_title(df.columns[idx]+' with '+df.columns[idx+3])
ax[idx].legend(loc='upper left')
I hope this is helpful, feel free to ask me question if you need more details :)
NOTE : re-used Alex's answer to edit my answer. Also check this matplotlib documentation for more details. In this specific case point 3 is no more relevant.
Upvotes: 8