LearningSlowly
LearningSlowly

Reputation: 9431

python ggplot geom_bar y axis incorrect values

df:

duration status    line
75526    Good      A
75526    Muy buen  B
75546    pas mal   C
75516    loco      D

I am plotting via:

p = ggplot(aes(x='status',weight='duration',fill='line'),data=df) + geom_bar(stat='identity')

Importantly, I am using stat='identity' to ensure the y-axis is the column value and not some density measurement. Yet, it is showing incorrect y-axis values.

I can compute the maximum duration value and I see that this is around the 86,000 mark (i.e 24hrs in seconds). Why is the plot showing seconds in excess of 250,000?

enter image description here

Upvotes: 1

Views: 1082

Answers (2)

gereleth
gereleth

Reputation: 2482

This plot is going to group the dataframe by status and line and use the sum of durations (aka weights) in every group as the bar height. Some groups must have multiple entries, that's where these extra tall bars come from.

Upvotes: 1

igauravsehrawat
igauravsehrawat

Reputation: 3954

I am guessing from the incomplete information you have provided that.

You want to put limit on y axis, for that you can use ylim method like ylim(low=0, high=864000) So your command will appear like

p = ggplot(aes(x='status',weight='duration',fill='line'),data=df) + geom_bar(stat='identity') + ylim(low=0, high=864000)

Let me know if this is correct.

Cheers

Upvotes: 0

Related Questions