Reputation: 137
I have a set of information and I want to grab the TOP 10 values verse the everything else. To elaborate I want to add all the values that are not in the TOP 10 together and add them to say a pie chart labeled as "others" along with the top 10. Currently I have the following code where X is my dataframe:
temp = X.SOME_IDENTIFIER.value_counts()
temp.head(10).plot(kind='pie')
This gets me a pie chart of just the top ten but I do not wish to discard all the other values from the dataframe. I want to add them as an eleventh variable on the chart but am not sure how to do this. Any help or advice is appreciated.
Upvotes: 1
Views: 22302
Reputation: 9
Here's how I approached to the problem:
temp = X.SOME_IDENTIFIER.value_counts().sort_values(ascending=False).head(10)
df=pd.DataFrame({'XX':temp.index,'Y':temp.values})
df=df.append({'XX'='Other','Y'=X.SOME_IDENTIFIER.value_counts().sort_values(ascending=False).iloc[10:].sum()})
df.set_index('XX').plot(kind='pie',y='Y')
Explanation----> I stored the top 10 values in a dataframe and manually calculated the sum of the rest of the values from the series and appended the result in the dataframe with the name Other and plotted the piechart for that dataframe. You will get the result hopefully.
Upvotes: 0
Reputation: 21
Using pandas:
# Sort the DataFrame in descending order; will create a Series
s_temp = X.SOME_IDENTIFIER.sort_values(ascending=False)
# Count how many rows are not in the top ten
not_top_ten = len(s_temp) - 10
# Sum the values not in the top ten
not_top_ten_sum = s_temp.tail(not_top_ten).sum()
# Get the top ten values
s_top = s_temp.head(10)
# Append the sum of not-top-ten values to the Series
s_top[10] = not_top_ten_sum
# Plot pie chart
_ = s_top.plot.pie()
# Show plot
plt.show()
Upvotes: 2
Reputation: 109546
Assign the results to a new dataframe (temp2), and then insert a new record that sums any remaining items in the list. It also identifies the number of unique items remaining.
temp = X.SOME_IDENTIFIER.value_counts()
temp2 = temp.head(10)
if len(temp) > 10:
temp2['remaining {0} items'.format(len(temp) - 10)] = sum(temp[10:])
temp2.plot(kind='pie')
Upvotes: 6