Xerceo
Xerceo

Reputation: 55

Pandas: Using Append Adds New Column and Makes Another All NaN

I just started learning pandas a week ago or so and I've been struggling with a pandas dataframe for a bit now. My data looks like this:

State    NY   CA   Other  Total
Year
2003    450   50    25      525
2004    300   75     5      380
2005    500  100   100      700
2006    250   50   100      400 

I made this table from a dataset that included 30 or so values for the variable I'm representing as State here. If they weren't NY or CA, in the example, I summed them and put them in an 'Other' category. The years here were made from a normalized list of dates (originally mm/dd/yyyy and yyyy-mm-dd) as such, if this is contributing to my issue:

dict = {'Date': pd.to_datetime(my_df.Date).dt.year}

and later:

my_df = my_df.rename_axis('Year')

I'm trying now to append a row at the bottom that shows the totals in each category:

final_df = my_df.append({'Year' : 'Total',
                         'NY': my_df.NY.sum(), 
                         'CA': my_df.CA.sum(), 
                         'Other': my_df.Other.sum(), 
                         'Total': my_df.Total.sum()}, 
                          ignore_index=True)

This does technically work, but it makes my table look like this:

         NY   CA   Other  Total  State
0       450   50    25      525    NaN
1       300   75     5      380    NaN
2       500  100   100      700    NaN
3       250   50   100      400    NaN
4         a    b     c        d   Total

('a' and so forth are the actual totals of the columns.) It adds a column at the beginning and puts my 'Year' column at the end. In fact, it removes the 'Date' label as well, and turns all the years in the last column into NaNs.

Is there any way I can get this formatted properly? Thank you for your time.

Upvotes: 1

Views: 551

Answers (1)

jezrael
jezrael

Reputation: 862511

I believe you need create Series by sum and rename it:

final_df = my_df.append(my_df.sum().rename('Total'))
print (final_df)
         NY   CA  Other  Total
State                         
2003    450   50     25    525
2004    300   75      5    380
2005    500  100    100    700
2006    250   50    100    400
Total  1500  275    230   2005

Another solution is use loc for setting with enlargement:

my_df.loc['Total'] = my_df.sum()
print (my_df)
         NY   CA  Other  Total
State                         
2003    450   50     25    525
2004    300   75      5    380
2005    500  100    100    700
2006    250   50    100    400
Total  1500  275    230   2005

Another idea from previous answer - add parameters margins=True and margins_name='Total' to crosstab:

df1 = df.assign(**dct)
out = (pd.crosstab(df1['Firing'], df1['State'], margins=True, margins_name='Total'))

Upvotes: 2

Related Questions