Reputation: 79
My goal is to create a bar graph with my .csv data to see the relationship between work year (x) and wage (y) grouped by gender (separate bars).
First off, I want to group the variable'workyear' into three groups: (1) more than 10 years, (2) just 10 years and (3) less than 10 years Then I would like to create the bar graph with gender (1=female, 0=male)
Part of my data looks like this:
... workyear gender wage
513 12 0 15.00
514 16 0 12.67
515 14 1 7.38
516 16 0 15.56
517 12 1 7.45
518 14 1 6.25
519 16 1 6.25
520 17 0 9.37
....
To do this, I tried to replace the variable's value into three groups and I used matplotlib.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#load data
df=pd.DataFrame.from_csv('data.csv', index_col=None)
print(df)
df.sort_Values("workyear", ascending=True, inplace=True)
#parameters
bar_width = 0.2
#replacing Education year -> Education level grouped by given criteria.
#But I got an error.
df.loc[df.workyear<10, 'workyear'] = 'G1'
df.loc[df.workyear==10, 'workyear'] = 'G2'
df.loc[df.workyear>10, 'workyear']='G3'
#plotting
plt.bar(x, df.education[df.gender==1], bar_width, yerr=df.wage,color='y', label='female')
plt.bar(x+bar_width, df.education[df.gender==0], bar_width, yerr=df.wage, color='c', label='male')
I want to see the bar graph like this (please consider '+' as a bar):
y=wage| + +
| + + + +
| + + + + +
| + + + + + +
|_______________________ x=work year (3-group)
>10 10 10<
But this is what I actually got... (yes. all errors)
Traceback (most recent call last):
File "data.py", line 21, in <module>
df.loc[df.workyear>10, 'workyear']='G3'
in wrapper
res = na_op(values, other)
in na_op
result = _comp_method_OBJECT_ARRAY(op, x, y)
in _comp_method_OBJECT_ARRAY
result = lib.scalar_compare(x, y, op)
File "pandas\_libs\lib.pyx", line 769, in pandas._libs.lib.scalar_compare (pandas\_libs\lib.c:13717)
TypeError: unorderable types: str() > int()
Could you please advice me?
Upvotes: 1
Views: 1742
Reputation: 18628
A more direct way :
df['Age']=pd.cut(df.workyear,[1,13,14,100])
df['Gender']=df.gender.map({0:'male',1:'female'})
df.pivot_table(values='wage',index='Age',columns='Gender').plot.bar()
for :
Upvotes: 1