David Geismar
David Geismar

Reputation: 3412

Seaborn Countplot : Display only n most frequent categories

I have a python array listing all occurences of string labels. Let's call it labels_array. Using seaborn as sns I d like to show a countplot of this array :

sns.countplot(labels_array) This works, but as they are too many different labels in my array, the outpout doesnt look good.

Is there a way to display only the n most frequent labels.

Upvotes: 0

Views: 11445

Answers (3)

Hamza
Hamza

Reputation: 301

You can use pd.value_counts() to get your occurrences sorted.

And to get the first N occurrences you can simply write pd.value_counts(labels_array).iloc[:N].index (index for labels)

you can apply it on countplot and it should look like this:

sns.countplot(labels_array, order=pd.value_counts(labels_array).iloc[:N].index)

Upvotes: 5

Stefan Käser
Stefan Käser

Reputation: 21

I came across the same problem (and this question) and found that this question has already been answered.

The countplot function has the parameter order where you can specify for which values you want to plot the counts. The most often occurred values can be obtained, as previously stated, with the value_counts function.

See: limit the number of groups shown in seaborn countplot?

Upvotes: 1

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339122

Although countplot should in principle know the counts and hence allow to show only part of them, this is not the case. Therefore, the use of countplot may not make too much sense here.

Instead just use a normal pandas plot. E.g. to show the 5 most frequent items in the list,

pandas.Series(labels_array).value_counts()[:5].plot(kind="bar")

Complete example:

import string
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

l = list(string.ascii_lowercase)
n = np.random.rand(len(l))
a = np.random.choice(l, p=n/n.sum(),size=400)

s = pd.Series(a)
s.value_counts()[:5].plot(kind="bar")

plt.show()

Upvotes: 2

Related Questions