Reputation: 393
Below is a sample of my dataset:
Donor State Donation Year
Illinois 50 2013
other 99 2016
California 26 2017
Indiana 50 2017
New Jersey 50 2017
Connecticut 25 2016
Wisconsin 10 2017
New York 100 2017
Indiana 74.32 2015
Indiana 25 2015
Pennsylvania 10 2014
Pennsylvania 10 2013
New Jersey 5 2018
New Jersey 5 2018
What I want to do is look at the donations for a specific set of states over the course of the different years which I've managed to do, but for some reason, the index names aren't returning what I need which is making it hard to determine what state is what.
state_list = ['Illinois','other','California','Indiana','New Jersey','Connecticut','Wisconsin','New York','Pennsylvania','Delaware']
I wrote a simple list comprehension which:
pd.DataFrame([df[df['Donor State'] == i].groupby(['Year'])["Donation"].sum() for i in state_list])
I get the following:
Year 2013 2014 2015 2016 2017 2018
Donation Amount 211.48 672.82 265.00 477.95 1900.01 52.00
Donation Amount NaN NaN 120.00 99.00 25.00 NaN
Donation Amount 414.50 2767.12 2158.36 391.02 1839.95 50.00
Donation Amount 20.00 NaN 274.32 85.00 367.44 100.00
Donation Amount 100.00 100.00 199.95 624.00 719.00 245.00
Donation Amount 125.00 30.00 25.00 870.56 105.00 215.00
Donation Amount NaN NaN NaN 100.00 191.66 75.00
Donation Amount 235.00 85.00 849.46 1110.00 446.59 87.18
Donation Amount 160.00 205.00 124.00 250.00 150.00 25.00
Donation Amount NaN NaN 110.00 52.89 100.00 NaN
What I want:
Year 2013 2014 2015 2016 2017 2018
Illinois 211.48 672.82 265.00 477.95 1900.01 52.00
California NaN NaN 120.00 99.00 25.00 NaN
Pennsylvania 414.50 2767.12 2158.36 391.02 1839.95 50.00
Delaware 20.00 NaN 274.32 85.00 367.44 100.00
...............................................................
Is there a way where this can pick up the state name label and make it the index? The dataframe is exactly what I want, but I need 'Donation Amount' in the index to be the state name.
Sorry if this is a stupid question and I've missed something glaringly obvious
Upvotes: 2
Views: 66
Reputation: 164683
You can use pivot_table
to transform your dataframe and then reindex
to specify the states which interest you.
state_list = ['Illinois','other','California','Indiana','NewJersey','Connecticut',
'Wisconsin','NewYork','Pennsylvania','Delaware']
res = df.pivot_table(index='DonorState', columns='Year', values='Donation',
aggfunc=sum, fill_value=0)\
.reindex(state_list, axis=0)
print(res)
Year 2013 2014 2015 2016 2017 2018
DonorState
Illinois 50.0 0.0 0.00 0.0 0.0 0.0
other 0.0 0.0 0.00 99.0 0.0 0.0
California 0.0 0.0 0.00 0.0 26.0 0.0
Indiana 0.0 0.0 99.32 0.0 50.0 0.0
NewJersey 0.0 0.0 0.00 0.0 50.0 10.0
Connecticut 0.0 0.0 0.00 25.0 0.0 0.0
Wisconsin 0.0 0.0 0.00 0.0 10.0 0.0
NewYork 0.0 0.0 0.00 0.0 100.0 0.0
Pennsylvania 10.0 10.0 0.00 0.0 0.0 0.0
Delaware NaN NaN NaN NaN NaN NaN
Upvotes: 2