Reputation:
I have to plot a parallel plot of some dataset with varying ranges. When I googled I found one beautiful javascript example in this website.
I have creates some sample dataset for the test and would like to achieve parallel plot having yxis-ticks and different-range yaxes something similar to this image:
So far I have done this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates
np.random.seed(100)
%matplotlib inline
df = pd.DataFrame({'calcium': np.random.randint(0,7,5),
'calories': np.random.randint(200,900,5),
'fiber': np.random.randint(10,75,5),
'potassium': np.random.randint(0,20,5)
})
df = df.T
df['name'] = df.index
df.reset_index(drop=True)
parallel_coordinates(df,'name')
The output is this:
As we can see the bottom curves are highly undiscernable. I would like to fix that. I have googled and tried to find how to change the vertical y-axis tick marks and change ranges (normalize).
Help will be appreciated. This is a beautiful plot, kudos to those who on the planet earth succeed to visualize this beautiful plot in python!!
Related links:
http://bl.ocks.org/syntagmatic/raw/3150059/
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.plotting.parallel_coordinates.html
https://pandas.pydata.org/pandas-docs/stable/visualization.html
How to plot parallel coordinates on pandas DataFrame with some columns containing strings?
Update
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates
np.random.seed(100)
plt.style.use('ggplot')
%matplotlib inline
df = pd.DataFrame({'calcium': np.random.randint(0,7,5),
'calories': np.random.randint(200,900,5),
'fiber': np.random.randint(10,75,5),
'potassium': np.random.randint(0,20,5),
'name': ['apple','banana','orange','mango','watermelon']
})
ax = parallel_coordinates(df,'name')
ax.grid(True)
ax.set_yscale('log')
Still Cannot put ytick marks on middle axes.
Upvotes: 4
Views: 2583
Reputation: 489
This is a solution that will help improve readability using a broken y axes. I stole most of this code from here.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(100)
%matplotlib inline
df = pd.DataFrame({'calcium': np.random.randint(0,7,5),
'calories': np.random.randint(200,900,5),
'fiber': np.random.randint(10,75,5),
'potassium': np.random.randint(0,20,5)
})
f, (ax, ax2) = plt.subplots(2, 1, sharex=True)
#plot the same data on both axes
ax.plot(df)
ax2.plot(df)
# zoom-in / limit the view to different portions of the data
ax.set_ylim(250, 800) # outliers only
ax2.set_ylim(0, 75) # most of the data
# hide the spines between ax and ax2
ax.spines['bottom'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax.xaxis.tick_top()
ax.tick_params(labeltop='off') # don't put tick labels at the top
ax2.xaxis.tick_bottom()
d = .015 # how big to make the diagonal lines in axes coordinates
kwargs = dict(transform=ax.transAxes, color='k', clip_on=False)
ax.plot((-d, +d), (-d, +d), **kwargs) # top-left diagonal
ax.plot((1 - d, 1 + d), (-d, +d), **kwargs) # top-right diagonal
kwargs.update(transform=ax2.transAxes) # switch to the bottom axes
ax2.plot((-d, +d), (1 - d, 1 + d), **kwargs) # bottom-left diagonal
ax2.plot((1 - d, 1 + d), (1 - d, 1 + d), **kwargs) # bottom-right diagonal
f.subplots_adjust(left=0.1, right=1.6,
bottom=0.1, top = 0.9,
hspace=0.3) # space between the two sections
f.legend(df.columns)
plt.show()
Which produces a plot that looks like this:
I still think that the calcium line is challenging to interpret but you could blow up the image or break the y axis again if the graph is simple enough to break into chunks.
Upvotes: 1