Reputation: 71
I want to create a plot that looks like the plot attached below.
My data frame is built at this format:
Playlist Type Streams
0 a classical 94
1 b hip-hop 12
2 c classical 8
The 'popularity' category can be replaced by the 'streams' - the only thing is that the streams variable has a high variance of values (goes from 0 to 10,000+) and therefore I believe the density graph might look weird.
However, my first question is how can I plot a graph similar to this in Pandas, when grouping by the 'Type' column and then creating the density graph.
I tried various methods but did not find a good one to establish my goal.
Upvotes: 1
Views: 10588
Reputation: 994
To augment the answer of @Student240 you could make use of the seaborn library, which makes it easy to fit 'kernal density estimates'. In other words, to have smooth curves similar to that in your question, rather than a binned histogram. This is done with the KDEplot class. A related plot type is the distplot which gives the KDE estimate but also shows the histogram bins.
Another difference in my answer is to use the explicit object oriented approach in matplotlib/seaborn. This involves initially declaring a figure and axes objects with plt.subplots()
rather than the implicit approach of fig.hist
. See this really good tutorial for more details.
import matplotlib.pyplot as plt
import seaborn as sns
## This block of code is copied from Student240's answer:
import random
categories = ['classical','hip-hop','indiepop','indierock','jazz'
,'metal','pop','rap','rock']
# NB I use a slightly different random variable assignment to introduce a bit more variety in my random numbers.
df = pd.DataFrame({'Type':[random.choice(categories) for _ in range(1000)],
'stream':[random.normalvariate(i,random.randint(0,15)) for i in
range(1000)]})
###split the data into groups based on types
g = df.groupby('Type')
## From here things change as I make use of the seaborn library
classical = g.get_group('classical')
hiphop = g.get_group('hip-hop')
indiepop = g.get_group('indiepop')
indierock = g.get_group('indierock')
fig, ax = plt.subplots()
ax = sns.kdeplot(data=classical['stream'], label='classical streams', ax=ax)
ax = sns.kdeplot(data=hiphop['stream'], label='hiphop streams', ax=ax)
ax = sns.kdeplot(data=indiepop['stream'], label='indiepop streams', ax=ax)
# for this final one I use the shade option just to show how it is done:
ax = sns.kdeplot(data=indierock['stream'], label='indierock streams', ax=ax, shade=True)
ax.set_xtitle('Count')
ax.set_ytitle('Density')
ax.set_title('KDE plot example from seaborn")
Upvotes: 5
Reputation: 88
Hi you can try the following example, I have used randon normals just for this example, obviously it wouldn't be possible to have negative streams. Anyway disclaimer over, here is the code:
import random
categories = ['classical','hip-hop','indiepop','indierock','jazz'
,'metal','pop','rap','rock']
df = pd.DataFrame({'Type':[random.choice(categories) for _ in range(10000)],
'stream':[random.normalvariate(0,random.randint(0,15)) for _ in
range(10000)]})
###split the data into groups based on types
g = df.groupby('Type')
###access the classical group
classical = g.get_group('classical')
plt.figure(figsize=(15,6))
plt.hist(classical.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="Classical Streams", color="#D73A30", density=True)
plt.legend(loc="upper left")
###hip hop
hiphop = g.get_group('hip-hop')
plt.hist(hiphop.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="hiphop Streams", color="#2A3586", density=True)
plt.legend(loc="upper left")
###indie pop
indiepop = g.get_group('indiepop')
plt.hist(indiepop.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="indie pop streams", color="#5D271B", density=True)
plt.legend(loc="upper left")
#indierock
indierock = g.get_group('indierock')
plt.hist(indierock.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="indie rock Streams", color="#30A9D7", density=True)
plt.legend(loc="upper left")
##jazz
jazz = g.get_group('jazz')
plt.hist(jazz.stream, histtype='stepfilled', bins=50, alpha=0.2,
label="jazz Streams", color="#30A9D7", density=True)
plt.legend(loc="upper left")
####you can add other here if you wish
##modify this to control x-axis, possibly useful for high-variance data
plt.xlim([-20,20])
plt.title('Distribution of Streams by Genre')
plt.xlabel('Count')
plt.ylabel('Density')
You can Google 'Hex color picker' if you want to get a specific '#000000' color in the format I have used in this example.
modify variable 'alpha' if you want to change how dense the color is displayed, you can also play around with 'bins' in the example I provided as this should allow you to make it look better if 50 is too large or small.
I hope this helps, plotting in matplotlib can be a pain to learn, but it is surely worth it!!
Upvotes: 3