jessica
jessica

Reputation: 29

How to assign column names to newly created dataframe pandas

I am trying to assign column names to a newly created dataframe, so that I can then reference by column name.

First I created a new dataframe called sums based on applying a sum across a set of columns

sums = data.iloc[:, 62:75].apply(np.sum)

The sums.head(5)) results are:

SCH_ENR_HI_F        66134                                                           
SCH_ENR_AM_M        3771                                                           
SCH_ENR_AM_F        3588                                                           
SCH_ENR_AS_M        13388                                                           
SCH_ENR_AS_F        12845

I want to add column headers 'student_type' and 'enrollment' so I tried:

sums.columns = ['student_type', 'enrollment']

which didn't work. I don't get an error on that row, but later when referencing I get Key Error 'enrollment'.

What is the best practice method for what I am trying to accomplish?

Upvotes: 2

Views: 8652

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

Demo:

In [98]: df = pd.DataFrame(np.random.randn(10, 10), columns=list('abcdefghij'))

In [99]: df
Out[99]:
          a         b         c         d         e         f         g         h         i         j
0  0.385203  1.187572 -1.727850  0.623870 -1.042432  0.016608  0.968118  0.551275  0.419904 -1.411984
1 -1.572881  0.187265 -1.578968  0.405994 -0.502633  0.595827 -0.405670  0.491843 -0.145028 -2.097630
2  0.302688 -0.616390 -0.296095  0.702851 -1.269653  1.030805 -1.830220  2.192292 -0.161340  0.750929
3 -0.684007 -1.159139  1.844801 -1.289543  0.469358  0.153529  1.086689  0.246760  2.087439  0.083689
4  0.127821  0.377964  0.633427 -1.003018  0.251742 -0.912455  1.166675  0.327728  1.755409  2.071918
5  0.580320  1.086474  1.251722 -1.456155 -0.458268 -1.155363  1.199957 -2.016104 -0.265787  1.381885
6  0.438060 -1.687241 -1.529382 -0.670691 -1.443586  0.395569 -0.877185  0.227902  0.395737  0.461797
7 -0.566059  0.309534  2.008027  0.397227  0.937474  1.348306  1.403535  1.567550  1.356093  0.231540
8 -2.199514  0.088451  0.628223  0.625264  0.663697 -1.215756 -1.421302  0.729683 -1.241268 -0.367049
9 -1.405923  0.211969 -0.289390  0.946114  1.185240 -0.057775  0.488948  0.774187 -0.030490 -0.649153

In [100]: sums = (df.iloc[:, 2:7]
                    .sum()
                    .reset_index()
                    .set_axis(['student_type', 'enrollment'], axis=1, inplace=False))

In [101]: sums
Out[101]:
  student_type  enrollment
0            c    0.944514
1            d   -0.718088
2            e   -1.209061
3            f    0.199295
4            g    1.779544

Upvotes: 1

Related Questions