Reputation: 3412
I have a python array listing all occurences of string labels. Let's call it labels_array. Using seaborn as sns I d like to show a countplot of this array :
sns.countplot(labels_array)
This works, but as they are too many different labels in my array, the outpout doesnt look good.
Is there a way to display only the n most frequent labels.
Upvotes: 0
Views: 11445
Reputation: 301
You can use pd.value_counts()
to get your occurrences sorted.
And to get the first N occurrences you can simply write pd.value_counts(labels_array).iloc[:N].index
(index for labels)
you can apply it on countplot
and it should look like this:
sns.countplot(labels_array, order=pd.value_counts(labels_array).iloc[:N].index)
Upvotes: 5
Reputation: 21
I came across the same problem (and this question) and found that this question has already been answered.
The countplot
function has the parameter order
where you can specify for which values you want to plot the counts.
The most often occurred values can be obtained, as previously stated, with the value_counts
function.
See: limit the number of groups shown in seaborn countplot?
Upvotes: 1
Reputation: 339122
Although countplot
should in principle know the counts and hence allow to show only part of them, this is not the case. Therefore, the use of countplot may not make too much sense here.
Instead just use a normal pandas plot. E.g. to show the 5 most frequent items in the list,
pandas.Series(labels_array).value_counts()[:5].plot(kind="bar")
Complete example:
import string
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
l = list(string.ascii_lowercase)
n = np.random.rand(len(l))
a = np.random.choice(l, p=n/n.sum(),size=400)
s = pd.Series(a)
s.value_counts()[:5].plot(kind="bar")
plt.show()
Upvotes: 2