Jaye C.
Jaye C.

Reputation: 11

Binning and Visualization with Pandas

So I'm trying to make an age interval column for my dataframe:

df['age_interval'] = pd.cut(x=df['Age'], bins=[18, 22, 27, 32, 37, 42, 47, 52, 57, 60], include_lowest=True)

And I added my graph:

Visualization

Problem: In the visualization the [18-22] bin is displayed as [17.99-22].

What I want: I want it to display [18-22].

Below is the plot code:

plt.figure(figsize=(15,8))
dist = sns.barplot(x=ibm_ages.index, y=ibm_ages.values, color='blue')
dist.set_title('IBM Age Distribution', fontsize = 24)
dist.set_xlabel('Age Range', fontsize=18)
dist.set_ylabel('Total Count', fontsize=18)

sizes=[]
for p in dist.patches:
    height = p.get_height()
    sizes.append(height)
    dist.text(p.get_x()+p.get_width()/2.,
            height + 5,
            '{:1.2f}%'.format(height/total*100),
            ha="center", fontsize= 8) 

plt.tight_layout(h_pad=3)
plt.show()

Upvotes: 1

Views: 1068

Answers (2)

Arne
Arne

Reputation: 10545

A bar plot is misleading here, because the bins do not have equal width. Age is a continuous variable. Why obscure the fact that the bins border each other?

This is precisely the setting where a histogram is useful. You can still customize the bins and set the tick marks accordingly. The other plot customizations also work all the same.

import numpy as np
import pandas as pd
import seaborn as sns
sns.set()

df = pd.DataFrame({'Age': np.random.normal(35, 10, 1000)})
bins = [18, 22, 27, 32, 37, 42, 47, 52, 57, 60]

ax = sns.histplot(data=df, x='Age', bins=bins)
ax.set_xticks(bins)

Age histogram

Upvotes: 1

Juned Khan
Juned Khan

Reputation: 122

That's because it's a float64 Type and you want an integer try:

import numpy as np
df['age_interval'] = pd.cut(x=df['Age'].astype('Int64'), bins=[18, 22, 27, 32, 37, 42, 47, 52, 57, 60], include_lowest=True)

you can use .astype('Int64') whenever you want to convert float64 to Int64

Upvotes: 1

Related Questions