Reputation: 11
So I'm trying to make an age interval column for my dataframe:
df['age_interval'] = pd.cut(x=df['Age'], bins=[18, 22, 27, 32, 37, 42, 47, 52, 57, 60], include_lowest=True)
And I added my graph:
Problem: In the visualization the [18-22] bin is displayed as [17.99-22].
What I want: I want it to display [18-22].
Below is the plot code:
plt.figure(figsize=(15,8))
dist = sns.barplot(x=ibm_ages.index, y=ibm_ages.values, color='blue')
dist.set_title('IBM Age Distribution', fontsize = 24)
dist.set_xlabel('Age Range', fontsize=18)
dist.set_ylabel('Total Count', fontsize=18)
sizes=[]
for p in dist.patches:
height = p.get_height()
sizes.append(height)
dist.text(p.get_x()+p.get_width()/2.,
height + 5,
'{:1.2f}%'.format(height/total*100),
ha="center", fontsize= 8)
plt.tight_layout(h_pad=3)
plt.show()
Upvotes: 1
Views: 1068
Reputation: 10545
A bar plot is misleading here, because the bins do not have equal width. Age is a continuous variable. Why obscure the fact that the bins border each other?
This is precisely the setting where a histogram is useful. You can still customize the bins and set the tick marks accordingly. The other plot customizations also work all the same.
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
df = pd.DataFrame({'Age': np.random.normal(35, 10, 1000)})
bins = [18, 22, 27, 32, 37, 42, 47, 52, 57, 60]
ax = sns.histplot(data=df, x='Age', bins=bins)
ax.set_xticks(bins)
Upvotes: 1
Reputation: 122
That's because it's a float64 Type and you want an integer try:
import numpy as np
df['age_interval'] = pd.cut(x=df['Age'].astype('Int64'), bins=[18, 22, 27, 32, 37, 42, 47, 52, 57, 60], include_lowest=True)
you can use .astype('Int64') whenever you want to convert float64 to Int64
Upvotes: 1