Reputation: 1020
I have a list of names, states, year, sex and the number of times that name appears. I am trying to plot a given name over the years in all states combined.
allyears.head()
and here is the results:
name sex number year state
0 Mary F 7065 1880 FL
1 Anna F 2604 1880 NY
2 Emma F 2003 1880 AZ
3 Eli F 1939 1880 AS
4 Minnie F 1746 1880 AK
then I do indexing:
allyears_indexed = allyears.set_index(['sex','name', 'state', 'year']).sort_index()
and through my function:
def plotname(sex,name):
data = allyears_indexed.loc[sex,name]
pp.plot(data.index,data.values)
then I would like to get all the "Emma"s over the years in all of states combined:
plotname('F', 'Emma')
but i get an error instead and an empty plot!
But when I pass in the 'state' parameter to the function, and provide the state name in the call, I get the 'Emma's overs the years in that particular state.
How can I get it over the years all states combined and keeping the same indexing pattern?
Upvotes: 2
Views: 159
Reputation: 109546
I believe you first need to group on the year and name, and then use loc
to access the resulting data. The groupby will sum across all states.
df = allyears.groupby(['year', 'name'], as_index=False).number.sum()
>>> df
year name number
0 1880 Anna 2604
1 1880 Eli 1939
2 1880 Emma 2003
3 1880 Mary 7065
4 1880 Minnie 1746
>>> df.loc[df.name == 'Emma']
year name number
2 1880 Emma 2003
And to plot it:
df.loc[df.name == 'Emma', ['year', 'number']].set_index('year').plot(title='Emma')
Upvotes: 1