user10341548
user10341548

Reputation: 367

Is the barplot in matplotlib using the mean?

I have a dataset df:

users  number   
user1   1          
user2   34       
user3   56      
user4   45      
user5   4
user1   3
user5   11
user1   3

when making a barplot like this:

plt.bar(x['users'], x['number'].sort_values(ascending=False), color="blue")

Does it take the mean of every user in the number column during the plot? What if I want the sum of all the numbers in the number column to appear in the barplot in descending order?

I tried this:

plt.bar(x['users'], x['number'].sum().sort_values(ascending=False), color="blue")

which gives:

AttributeError: 'numpy.float64' object has no attribute 'sort_values'

code:

import pandas as pd
df = pd.DataFrame({'number': [10,34,56,45,33],
'user': ['user1','user2','user3','user4','user1']})
#index=['user1','user2','user3','user4','user1'])
plt.bar(df['user'], df['number'], color="blue")

enter image description here

It always keeps the biggest value for the user that has many values.

Upvotes: 2

Views: 7775

Answers (1)

Sheldore
Sheldore

Reputation: 39072

I am not sure if this is what you want OR do you want to first groupby the values for each user and then plot the total numbers in descending order.

x = x.sort_values('number',ascending=False)
plt.bar(range(len(x['users'])), x['number'], color="blue")
plt.xticks(range(len(x['users'])), x['users'])
plt.ylabel('Numbers')

Output

enter image description here

If you want to plot the mean of each user, use the following code:

x1 = x.groupby('users').mean().reset_index()
plt.bar(range(len(x1)), x1['number'], color="blue")
plt.xticks(range(len(x1)), x1['users'])
plt.ylabel('Mean')

Output

enter image description here

What if you don't sort or group by: All bars are present but you don't see the different bars for same x-value because alpha=1 by default. I used alpha=0.2 to highlight my point. Now you see that at user1 there are two bars behind each other.

import pandas as pd
df = pd.DataFrame({'number': [10,34,56,45,51], 'user': 'user1','user2','user3','user4','user1']})
plt.bar(df['user'], df['number'], color="blue", linewidth =2, edgecolor='black' , alpha = 0.2)

Output

enter image description here

Upvotes: 2

Related Questions