Python List Comprehension Returning Wrong Labels

Question

Below is a sample of my dataset:

Donor State       Donation    Year
Illinois          50          2013
other             99          2016
California        26          2017
Indiana           50          2017
New Jersey        50          2017
Connecticut       25          2016
Wisconsin         10          2017
New York          100         2017
Indiana           74.32       2015
Indiana           25          2015
Pennsylvania      10          2014
Pennsylvania      10          2013
New Jersey        5           2018
New Jersey        5           2018

What I want to do is look at the donations for a specific set of states over the course of the different years which I've managed to do, but for some reason, the index names aren't returning what I need which is making it hard to determine what state is what.

state_list = ['Illinois','other','California','Indiana','New Jersey','Connecticut','Wisconsin','New York','Pennsylvania','Delaware']

I wrote a simple list comprehension which:

pd.DataFrame([df[df['Donor State'] == i].groupby(['Year'])["Donation"].sum() for i in state_list])

I get the following:

Year            2013    2014    2015    2016    2017    2018
Donation Amount 211.48  672.82  265.00  477.95  1900.01 52.00
Donation Amount NaN     NaN     120.00  99.00   25.00   NaN
Donation Amount 414.50  2767.12 2158.36 391.02  1839.95 50.00
Donation Amount 20.00   NaN     274.32  85.00   367.44  100.00
Donation Amount 100.00  100.00  199.95  624.00  719.00  245.00
Donation Amount 125.00  30.00   25.00   870.56  105.00  215.00
Donation Amount NaN     NaN     NaN     100.00  191.66  75.00
Donation Amount 235.00  85.00   849.46  1110.00 446.59  87.18
Donation Amount 160.00  205.00  124.00  250.00  150.00  25.00
Donation Amount NaN     NaN     110.00  52.89   100.00  NaN

What I want:

Year            2013    2014    2015    2016    2017    2018
Illinois        211.48  672.82  265.00  477.95  1900.01 52.00
California      NaN     NaN     120.00  99.00   25.00   NaN
Pennsylvania    414.50  2767.12 2158.36 391.02  1839.95 50.00
Delaware        20.00   NaN     274.32  85.00   367.44  100.00
...............................................................

Is there a way where this can pick up the state name label and make it the index? The dataframe is exactly what I want, but I need 'Donation Amount' in the index to be the state name.

Sorry if this is a stupid question and I've missed something glaringly obvious

jpp · Accepted Answer

pivot_table + reindex

You can use pivot_table to transform your dataframe and then reindex to specify the states which interest you.

state_list = ['Illinois','other','California','Indiana','NewJersey','Connecticut',
              'Wisconsin','NewYork','Pennsylvania','Delaware']

res = df.pivot_table(index='DonorState', columns='Year', values='Donation',
                     aggfunc=sum, fill_value=0)\
        .reindex(state_list, axis=0)

print(res)

Year          2013  2014   2015  2016   2017  2018
DonorState                                        
Illinois      50.0   0.0   0.00   0.0    0.0   0.0
other          0.0   0.0   0.00  99.0    0.0   0.0
California     0.0   0.0   0.00   0.0   26.0   0.0
Indiana        0.0   0.0  99.32   0.0   50.0   0.0
NewJersey      0.0   0.0   0.00   0.0   50.0  10.0
Connecticut    0.0   0.0   0.00  25.0    0.0   0.0
Wisconsin      0.0   0.0   0.00   0.0   10.0   0.0
NewYork        0.0   0.0   0.00   0.0  100.0   0.0
Pennsylvania  10.0  10.0   0.00   0.0    0.0   0.0
Delaware       NaN   NaN    NaN   NaN    NaN   NaN

Python List Comprehension Returning Wrong Labels

Answers (1)

pivot_table + reindex

Related Questions