user11395824
user11395824

Reputation: 71

Density Plot Python Pandas

I want to create a plot that looks like the plot attached below.

My data frame is built at this format:

   Playlist  Type        Streams
0  a         classical   94  
1  b         hip-hop     12
2  c         classical   8

The 'popularity' category can be replaced by the 'streams' - the only thing is that the streams variable has a high variance of values (goes from 0 to 10,000+) and therefore I believe the density graph might look weird.

However, my first question is how can I plot a graph similar to this in Pandas, when grouping by the 'Type' column and then creating the density graph.

I tried various methods but did not find a good one to establish my goal.

enter image description here

Upvotes: 1

Views: 10588

Answers (2)

Robert King
Robert King

Reputation: 994

To augment the answer of @Student240 you could make use of the seaborn library, which makes it easy to fit 'kernal density estimates'. In other words, to have smooth curves similar to that in your question, rather than a binned histogram. This is done with the KDEplot class. A related plot type is the distplot which gives the KDE estimate but also shows the histogram bins.

Another difference in my answer is to use the explicit object oriented approach in matplotlib/seaborn. This involves initially declaring a figure and axes objects with plt.subplots() rather than the implicit approach of fig.hist. See this really good tutorial for more details.

import matplotlib.pyplot as plt
import seaborn as sns

## This block of code is copied from Student240's answer:
import random 

categories = ['classical','hip-hop','indiepop','indierock','jazz'
          ,'metal','pop','rap','rock']

# NB I use a slightly different random variable assignment to introduce a bit more variety in my random numbers.
df = pd.DataFrame({'Type':[random.choice(categories) for _ in range(1000)],
              'stream':[random.normalvariate(i,random.randint(0,15)) for i in 
               range(1000)]})


###split the data into groups based on types
g = df.groupby('Type')

## From here things change as I make use of the seaborn library
classical = g.get_group('classical')
hiphop = g.get_group('hip-hop')
indiepop = g.get_group('indiepop')
indierock = g.get_group('indierock')
fig, ax = plt.subplots()

ax = sns.kdeplot(data=classical['stream'], label='classical streams', ax=ax)
ax = sns.kdeplot(data=hiphop['stream'], label='hiphop streams', ax=ax)
ax = sns.kdeplot(data=indiepop['stream'], label='indiepop streams', ax=ax)

# for this final one I use the shade option just to show how it is done:
ax = sns.kdeplot(data=indierock['stream'], label='indierock streams', ax=ax, shade=True)

ax.set_xtitle('Count')
ax.set_ytitle('Density')
ax.set_title('KDE plot example from seaborn")

enter image description here

Upvotes: 5

Student240
Student240

Reputation: 88

Hi you can try the following example, I have used randon normals just for this example, obviously it wouldn't be possible to have negative streams. Anyway disclaimer over, here is the code:

import random 

categories = ['classical','hip-hop','indiepop','indierock','jazz'
          ,'metal','pop','rap','rock']

df = pd.DataFrame({'Type':[random.choice(categories) for _ in range(10000)],
              'stream':[random.normalvariate(0,random.randint(0,15)) for _ in 
               range(10000)]})

###split the data into groups based on types
g = df.groupby('Type')



###access the classical group 
classical = g.get_group('classical')
plt.figure(figsize=(15,6))
plt.hist(classical.stream, histtype='stepfilled', bins=50, alpha=0.2,
     label="Classical Streams", color="#D73A30", density=True)
plt.legend(loc="upper left")

###hip hop

hiphop = g.get_group('hip-hop')

plt.hist(hiphop.stream, histtype='stepfilled', bins=50, alpha=0.2,
     label="hiphop Streams", color="#2A3586", density=True)
plt.legend(loc="upper left")

###indie pop
indiepop = g.get_group('indiepop')

plt.hist(indiepop.stream, histtype='stepfilled', bins=50, alpha=0.2,
     label="indie pop streams", color="#5D271B", density=True)
plt.legend(loc="upper left")


#indierock

indierock = g.get_group('indierock')

plt.hist(indierock.stream, histtype='stepfilled', bins=50, alpha=0.2,
     label="indie rock Streams", color="#30A9D7", density=True)
plt.legend(loc="upper left")


##jazz
jazz = g.get_group('jazz')
plt.hist(jazz.stream, histtype='stepfilled', bins=50, alpha=0.2,
     label="jazz Streams", color="#30A9D7", density=True)
plt.legend(loc="upper left")


####you can add other here if you wish

##modify this to control x-axis, possibly useful for high-variance data
plt.xlim([-20,20])

plt.title('Distribution of Streams by Genre')
plt.xlabel('Count')
plt.ylabel('Density')

enter image description here

You can Google 'Hex color picker' if you want to get a specific '#000000' color in the format I have used in this example.

modify variable 'alpha' if you want to change how dense the color is displayed, you can also play around with 'bins' in the example I provided as this should allow you to make it look better if 50 is too large or small.

I hope this helps, plotting in matplotlib can be a pain to learn, but it is surely worth it!!

Upvotes: 3

Related Questions